RemiAllio / MitoFinder

MitoFinder: efficient automated large-scale extraction of mitogenomic data from high throughput sequencing data
86 stars 14 forks source link

multi-user / HPC support for MitoFinder #20

Open bgruening opened 3 years ago

bgruening commented 3 years ago

Hi @RemiAllio,

Congrats on this nice project. I wanted to create a Bioconda package for MitoFinder but found a few problems that prevent to use MitFinder in multi-user settings or HPC environments.

The main point that prevents us from using MitoFinder currently is that the installation is not decoupled from its execution. You rely on a lot of path mangling like pathToMegahitFolder = os.path.join(module_dir, 'megahit/') instead it would be better imho to just assume megahit is on your PATH. The user or the package manager is then responsible to put it in the path. Another point is that Mitofinder.config seems to be assumed next to the main python file. Is that correct? I could not find a way to change it - maybe I missed it. On HPC systems or in multi-user systems the installation path is not writable and a user can not modify the config file. It would be nice if Mitofinder.config can be passed to MitoFinder via some command line arguments.

We have recently collected a few tips how developers can create tools that are easily deployable on HPC and cloud systems [1]. Maybe that helps a little bit. Please let me know if I can help in any way. I would really like to see this tool running as part of our pipelines. I think with some restructuring this project could be easily used by many more researchers.

[1] https://academic.oup.com/gigascience/article/8/5/giz054/5497810

RemiAllio commented 3 years ago

Hi @bgruening,

Thank you for your message.

I will defend my PhD thesis in a few days so it will be difficult for me to do a lot within the next two weeks...

I choose to not assume that all the programs needed by MitoFinder are already in the PATH because some users may not be able to put the programs in their path (e.g. skills or right permissions). That's why I have chosen to precise the path of the softwares in a config file. Additionally, given that all programs come with and/or are installed by MitoFinder, the global installation of MitoFinder is quite easy for almost all users (run install.sh and that's it!). Indeed, when the Mitofinder.config file is unchanged, the programs installed by MitoFinder are used and you don't need to install them by yourself.

However, indeed, if you want to use another program the the one provided by MitoFinder and can't write in the Mitofinder.config file, that's a problem... I can easily add an option to specify another config file that the one found next to the main python file.

Another thing that I plan to do is to create a singularity image with adequate environment and programs already installed into it to allow users to run MitoFinder without the need to install anything.

So, I am not sure to understand what is the problem in your case. Please tell me if I can help you in any way. Cheers, Rémi

bgruening commented 3 years ago

Hi @bgruening,

Thank you for your message.

I will defend my PhD thesis in a few days so it will be difficult for me to do a lot within the next two weeks...

Oh cool, good lust!

I choose to not assume that all the programs needed by MitoFinder are already in the PATH because some users may not be able to put the programs in their path (e.g. skills or right permissions). That's why I have chosen to precise the path of the softwares in a config file.

To install mitofinder you already need some dependencies that are nontrivial to install. E.g. a compiler etc. I think this is a separation of concern. You don't provide any classical way to package your software. Which makes it hard to offer you software e.g. as Debian package or Conda package. Both ways are probably the most easiest way to use your software without worrying about permissions and skills - both are solved by this. In addition, your code gets more readable and you have less complexity to carry - as this is done by package.

Additionally, given that all programs come with and/or are installed by MitoFinder, the global installation of MitoFinder is quite easy for almost all users (run install.sh and that's it!).

If you have compiler, java etc is installed. It might be easy, but it installs packages again that a user might already have available. It is also hard to update packages with bug fixes etc. All this would be solved with a Conda package.

Indeed, when the Mitofinder.config file is unchanged, the programs installed by MitoFinder are used and you don't need to install them by yourself.

Yes, but in a multi-user system how do you do this? How do you do this in a pipeline, where the installation happens automatically before the tool is run automatically - a typical cloud setup.

However, indeed, if you want to use another program the one provided by MitoFinder, and can't write in the Mitofinder.config file, that's a problem... I can easily add an option to specify another config file that the one found next to the main python file.

That would be awesome and already help a lot.

Another thing that I plan to do is to create a singularity image with adequate environment and programs already installed into it to allow users to run MitoFinder without the need to install anything.

The nice part of a Bioconda package is you get the Singulairty container and a Docker container for free. Bioconda will create you all of that, once we have a package defined.

So, I am not sure to understand what is the problem in your case. Please tell me if I can help you in any way.

The main problem is that your code carries too many assumptions about the user-setup, it's less flexible this way. A new config file option would enable us to hack-around this. It's not nice but I guess we could make it work.

Thanks @RemiAllio and good luck with your defense!

RemiAllio commented 2 years ago

Hi, Sorry for the long delay ...

I haven't worked on a conda package yet but, since our last conversation, I have created both the singularity image (singularity pull --arch amd64 library://remiallio/default/mitofinder:v1.4.1 and the --config option.

I know this is not exactly what you wanted but for now this is what I could do. I hope these new tools will help you! Cheers, Rémi