Open bgruening opened 7 years ago
Is this mostly solved/decided with the plan to send singularity containers to CVMFS?
EESSI is working on something that matches the original approach described in this issue. Rather than be a container solution, we use gentoo prefix to give us OS independence, and then provide architecture optimised builds of software that can be used on a wide variety of hardware platforms.
As part of https://github.com/elixir-europe/biohackathon-projects-2022/tree/main/16 we will be looking into how to add EESSI support to the Galaxy toolshed.
@ocaisa the Galaxy community will take part at the BH, lets discuss this soon :) Sounds great!
Interesting :)
On our HPC cluster the admins use easybuild since a few years (always arguing with some % performance boost) .. and I try to convince them about conda and containers since a few years (always arguing with reproducibility and less problems).
Wondering if there were any insights gained during the BH?
After publishing our preprint
Practical computational reproducibility in the life sciences
it is time to discuss the next steps.One problem we currently have in the training-material project or in general with Galaxy Docker flavors is that the images are getting too big if we include all dependencies. To solve this we could simply not install dependencies during tool-install time, but rather during tool-runtime, which might have other downsides. Also it would not work in setups that @natefoo is maintaining with usegalaxy.org.
So people have started to think about a centralized CVMFS store for
_conda/
and conda environments. I think this idea is very appealing for the training project, especially because we already have the reference data shared via CVMFS with the entire community. @natefoo is maintaining such a beast afaik and in Freiburg we are also evaluating a CVMFS store for dependencies.One point that made me nervous is that we currently did not have a community model to contribute to such a store. I was wrong - we have one. @jmchilton and me maintaining a repository called multi-containers (https://github.com/BioContainers/multi-package-containers) since some month. All
tools-iuc
,tools-galaxy
... and potentially all others, even the entire TS could submit requirements to this repository. After merge a Docker & Singularity container will be produced and stored. This is already working and works well.What is missing is the conda environment, but we can just extract the conda env from the containers again. Please have a look at the following commands:
This means that we can create all environments, for single dependency tools, but also for multi-dependency tools, from our
frozen environments
the containers. With this in place we just needs to track quay.io/singularity-ftp and copy out the environment, put it on CVMFS and use this.We could also think about teaching Galaxy naively use this approach.
@natefoo I would be interested if this would save you some work to create the environments. Would this be something we could set up and make usable for the entire community?
@afgane @jmchilton this could solve our cloudman problems, which we discussed recently. Such a storage could also be used by the CWL community and other, assuming they are using galaxy-lib and our environment decoding.
CVMFS was developed to distribute dependencies/tools, lets use it for that purpose as well :)