galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.37k stars 992 forks source link

Conda, Docker, Singularity and the next step #4931

Open bgruening opened 6 years ago

bgruening commented 6 years ago

After publishing our preprint Practical computational reproducibility in the life sciences it is time to discuss the next steps.

One problem we currently have in the training-material project or in general with Galaxy Docker flavors is that the images are getting too big if we include all dependencies. To solve this we could simply not install dependencies during tool-install time, but rather during tool-runtime, which might have other downsides. Also it would not work in setups that @natefoo is maintaining with usegalaxy.org.

So people have started to think about a centralized CVMFS store for _conda/ and conda environments. I think this idea is very appealing for the training project, especially because we already have the reference data shared via CVMFS with the entire community. @natefoo is maintaining such a beast afaik and in Freiburg we are also evaluating a CVMFS store for dependencies.

One point that made me nervous is that we currently did not have a community model to contribute to such a store. I was wrong - we have one. @jmchilton and me maintaining a repository called multi-containers (https://github.com/BioContainers/multi-package-containers) since some month. All tools-iuc, tools-galaxy ... and potentially all others, even the entire TS could submit requirements to this repository. After merge a Docker & Singularity container will be produced and stored. This is already working and works well.

What is missing is the conda environment, but we can just extract the conda env from the containers again. Please have a look at the following commands:

% cid=`docker run -d quay.io/biocontainers/samtools:1.6--0`
% docker cp $cid:/usr/local/ /home/bag/miniconda2/envs/__bar__
 % ls -l /home/bag/miniconda2/envs/__bar__
insgesamt 36
drwxr-xr-x 2 bag bag 4096 Okt 13 04:23 bin
drwxr-xr-x 2 bag bag 4096 Okt 13 04:23 conda-meta
drwxr-xr-x 9 bag bag 4096 Okt 13 04:23 include
drwxr-xr-x 6 bag bag 4096 Okt 13 04:23 lib
drwxr-xr-x 3 bag bag 4096 Okt 13 04:23 man
drwxr-xr-x 2 bag bag 4096 Okt 13 04:23 sbin
drwxr-xr-x 9 bag bag 4096 Okt 13 04:23 share
drwxr-xr-x 4 bag bag 4096 Okt 13 04:23 ssl
drwxr-xr-x 3 bag bag 4096 Okt 13 04:23 x86_64-conda_cos6-linux-gnu
 % . activate __bar__
 % which samtools
/home/bag/miniconda2/envs/__bar__/bin/samtools
 % samtools --version
samtools 1.6
Using htslib 1.6
Copyright (C) 2017 Genome Research Ltd.
 % docker stop $cid
37cada22cccc8771392a68ab0a7068aa5257c07e9bc2f44a8469908affcdca49

This means that we can create all environments, for single dependency tools, but also for multi-dependency tools, from our frozen environments the containers. With this in place we just needs to track quay.io/singularity-ftp and copy out the environment, put it on CVMFS and use this.

We could also think about teaching Galaxy naively use this approach.

@natefoo I would be interested if this would save you some work to create the environments. Would this be something we could set up and make usable for the entire community?

@afgane @jmchilton this could solve our cloudman problems, which we discussed recently. Such a storage could also be used by the CWL community and other, assuming they are using galaxy-lib and our environment decoding.

CVMFS was developed to distribute dependencies/tools, lets use it for that purpose as well :)

hexylena commented 5 years ago

Is this mostly solved/decided with the plan to send singularity containers to CVMFS?

ocaisa commented 1 year ago

EESSI is working on something that matches the original approach described in this issue. Rather than be a container solution, we use gentoo prefix to give us OS independence, and then provide architecture optimised builds of software that can be used on a wide variety of hardware platforms.

As part of https://github.com/elixir-europe/biohackathon-projects-2022/tree/main/16 we will be looking into how to add EESSI support to the Galaxy toolshed.

bgruening commented 1 year ago

@ocaisa the Galaxy community will take part at the BH, lets discuss this soon :) Sounds great!

bernt-matthias commented 1 year ago

Interesting :)

On our HPC cluster the admins use easybuild since a few years (always arguing with some % performance boost) .. and I try to convince them about conda and containers since a few years (always arguing with reproducibility and less problems).

Wondering if there were any insights gained during the BH?