Closed fmalmeida closed 2 years ago
Adding this here, since the question above would decide whether the following information is relevant or not
https://github.com/fmalmeida/bacannot/blob/0cd2c214f09334eafb79bb7b691210a0d29a2547/main.nf#L111
If we continue to use custom modules, then the allocation of module-level parameters could be simplified by using the addParams (params.MODULE_NAME)
in the module import statement . As shown here
And then collect all the pipeline parameters within the config file itself, as shown here.
Hi @abhi18av,
Thank you for pointing that out, however, I believe this one is not relevant, because I actually don't use module-level parameters. I only use the parameters that are set globally to the workflow.
This was a mistake, I thought that I had to declare the parameters I wanted to use, by I figure I hadn't. These have been removed in the new 3.x version of the pipeline.
Note on the issue:
I will try to address this issue in branch https://github.com/fmalmeida/bacannot/tree/remodeling.
Steps:
Found a small hindrance while updating the pipeline in the https://github.com/fmalmeida/bacannot/tree/remodeling branch that needs to be addressed before continuing:
PROKKA
module.bacannot:misc_tools
that will have all this conda packages that are useful for this other modules that could not use the quay.io image. Then, all these "problematic" modules, would use the same "misc_tools" image and conda "env.yml" fileNeeds to think about this.
Decided to always use biocontainer images whenever possible. And create a miscellaneous
image that hosts the tools for the modules which have more than one conda dependencie and, therefore, does not have a biocontainer.
Definitely a great step forward! 👍
Might I suggest to cross-publish it on docker and quay.io registeries so that there's a fallback in case of running across docker rate limits.
What do you mean by cross-publish it? I can upload images to quay.io?
Oh, nothing fancy - just that you'll need to tag the same image twice. First for pushing to dockerhub and then again for pushing it to quay.io
Hindrance mentioned in this comment and thought to be solved in this other comment is actually yet to be solved.
While implementing changes, we saw that the changes for biocontainers would not grasp the entirety of modules and we would still need two or three different images to allocate other tools and programming languages that would be required.
After some thought, we saw that having "half" from biocontainers and "half" from custom docker images would not be the most standardized option ...
We then are yet to decide in the dilemma again:
... 🤔 ...
Hi @fmalmeida ,
Unless I'm mistaken, these are all of the docker images used in the pipeline right?
https://github.com/fmalmeida/bacannot/blob/develop/conf/docker.config
While implementing changes, we saw the the changes for biocontainers would not grasp the entirety of modules and we would still need two or three different images to allocate other tools and programming languages that would be required.
Also, if possible could we discuss this further using some example modules, a bit too abstract for me atm 😅
HI @abhi18av,
It is nothing too fancy. It is just because some modules, such as the one that renders the rmarkdown reports or the one for the genome browser for instance, have a few dependencies such as scripts or multiple libraries that would not be available inside the biocontainers images which are designed to have only the tool (or conda package) itself. Or even some modules such as digIS which is not available in conda.
So, if going forward with changing the ones that could be changed to biocontainers, some modules would still require some non-biocontainer images like the bacannot:server
; bacannot:renv
; bacannot:jbrowse
. And we would end up with a mixture of biocontainers and custom images.
The "problem", is not actually a problem for real, is just a concept and that I am not too much of a fan of doing such mixtures, I like things more standardized 😅
Just to be clear, I am still going to refactor the modules to read the databases from outside the containers as suggested when opening the issue. But, instead of changing everything to biocontainers, I am thinking in:
bacannot:renv
; bacannot:py36_env
; bacannot:perl_env
; etc. With its related tools inside them.
And yes, these are the docker images used.
The point is just that actually, instead of pointing 60%/70% to biocontainers, I am leaning towards creating these custom smaller images, what for me, would be easier to maintain.
😁
Hi Felipe,
Thanks for sharing the full context!
For this, allow me to respond after taking a deeper look at this today and by tomorrow, I'll share my thoughts (if you need 😅 ). Perhaps we can find an optimal strategy for this one.
Hi Abhinav,
Thanks for the interest.
For this issue specifically, we have discussed here in the lab and decided to go on as described for now, which is a way that is already established for our local pipelines, thus would follow the standards of the lab and would require less "changes" for the moment being.
But surely, I'd like to hear your thoughts, maybe I could use them to find some optimal strategy as you said for future releases or future pipelines.
😄
For this issue specifically, we have discussed here in the lab and decided to go on as described for now, which is a way that is already established for our local pipelines, thus would follow the standards of the lab and would require less "changes" for the moment being.
Ah, okay. Yeah, maintenance burden is an important factor 👍
Then, I'll simply just summarize my suggestions here
An overall preference for small docker images should be given (like you plan), I have seen that on some clouds (like Azure) larger docker images are problematic unless, you move them to the native container registry (like ACR).
You can combine the docker.registry
and process.ext
options, to allow people to change the base name of the registry dynamically
The second point, could be used in combination with the process selectors
to add this dynamic behavior.
Hope this helps (in some way 😉 )!
Many thanks for the suggestions!
About the first one, we were already facing some problems with the image sizes, which also helped triggering this issue 😅
And about the second one, I just loved the idea. I didn't know about these options and how useful they could be.
Thanks again for these nice resources 😄
It would be nice that the pipeline was also capable of running with singularity and conda, instead of only docker, which would be more similar to what is found in nf-core modules. To do that, the following should be addressed:
--databases
These changes, are related and would also affect the following issues making them more possible or to be easier implemented: