bioconda / bioconda-recipes

Conda recipes for the bioconda channel.
https://bioconda.github.io
MIT License
1.64k stars 3.26k forks source link

Adding Singularity builds to Bioconda? #5144

Open bgruening opened 7 years ago

bgruening commented 7 years ago

Dear @bioconda/all,

Since end of last year Bioconda is assembling BioContainers as part of it's advanced testing of packages based on a minimal busybox container. With this technique in place we were able to generate more then 2300 Docker and rkt containers over the last months - minimal in size and tested with the same tests as the underlying Conda packages. Cyverse, OSG or analysis frameworks like Galaxy using the predictable namespace to automatically resolve tool/workflow dependencies - relying on the guaranteed reproducibility that Docker and Conda binaries are the same.

Over the last months we worked a lot on the next step: Supporting Singularity images. With this issue I would like to ask the Bioconda community to integrate Singularity supports in bioconda-utils. The result will be that for every Bioconda package our community gains a Docker image and a Singularity image.

Here are two example calls that generate both Docker and Singularity images.

mulled-build build-and-test pysam=0.11.2.1--py36_0 -n biocontainers \
   --test 'python -c "import pysam"' \
   --extra-channels file://anaconda/conda-bld,conda-forge,defaults 
   --singularity

mulled-build build-and-test samtools=1.4.1--0 -n biocontainers \
    --test "samtools --help" \
    --extra-channels file://anaconda/conda-bld,conda-forge,defaults 
    --singularity

The only thing we need to add is a --singularity argument to the current mulled build step. This will result in a Singularity bootstrap operation followed by a simple copy of the content similar to what is in $PREFIX. My estimation is that this takes <2 minutes. Then we upload the image to a public archive, this can take additional time, but only on the master branch.

For sure there will be bugs that we haven't encountered yet, and a few packages might fail, I will do my best to fix this then as soon as possible. Please let me know if the Bioconda community wants to support the generation of Singularity images in addition to Docker/rkt ones.

Feel free to ask for more details or try it out.

Thanks, Bjoern

chapmanb commented 7 years ago

Björn -- thanks so much for doing this, I'm looking forward to trying out Singularity images and this should be great for HPC adoption. +1 from me. My only small suggestion might be to have a way to skip mulled for builds that cause issues, which would let folks work around hard to diagnose problems. Thanks again.

lparsons commented 7 years ago

I second Brad's comments. Thanks so much Björn, this is an excellent idea and would really help with HPC utilization. And it would be nice to be able to "blacklist" problematic recipes.

sebastian-luna-valero commented 7 years ago

That's great news! Many thanks @bgruening! +1 from me as well. What about using a bioconda account in https://singularity-hub.org to archive the Singularity images?

I am sure @gmkurtzer would be happy to help with that.

I have a question, though. Is it currently possible to bootstrap a busybox image with Singularity?

upendrak commented 7 years ago

Yes that would be so great. Boncontainers are such a big hit and I can't wait to see the Singularity images built from the Bioconda packages. I am a very big proponent of Bioconda and Bicontainers and now I can't wait to see the Singularity images. Great initiative @bgruening by the way.

boegel commented 7 years ago

@sebastian-luna-valero Seems so, yes: https://github.com/singularityware/singularity/blob/master/examples/busybox/Singularity

jerowe commented 7 years ago

Yes! This would be great.

marcelm commented 7 years ago

I’d like to mention that it’s possible already now to use Biocontainers with Singularity since Singularity supports running Docker images. So this works, for example (which I think is really cool):

singularity pull docker://biocontainers/samtools
singularity run samtools.img

I’m just getting started with Singularity, though, so I don’t know whether that works well in all cases and what the advantages are in providing both Docker and Singularity images.

johanneskoester commented 7 years ago

I like the idea (if there are any downsides of just using docker images from singularity). I wonder if the build should happen only in the master branch though. The docker test via mulled seems sufficient as a test, right? This way, the additional build time would not be noticable.

johanneskoester commented 7 years ago

Why don't we just convert the docker image? See http://singularity.lbl.gov/docs-docker

johnfonner commented 7 years ago

Glad to see this issue promoted. We (at the Texas Advanced Computing Center) have been manually converting docker images to singularity to make them available on several academic supercomputers. It represents a big step forward for reproducible computing for HPC systems that can't run Docker. It would be great to see this effort "upstreamed" to BioConda.

Converting Docker containers to Singularity images has required a bit of futzing to make it work reliably, but it does work. Building them natively as Björn proposes is as good or better, but whatever the process, having an image for every BioConda recipe right away is a big deal. There is a lot of demand for a big, portable app catalogue.

Also, I'm not sure I understand the rationale for skipping the containerization step. Are problems with the build that common (and that difficult to fix)? It seems like it would be a more inconvenient problem to sustain disparate catalogues where apps are only sometimes available in the different container formats. Can we start with trying to build containers all the time and only build an opt-out if it becomes a real burden somehow?

simonvh commented 7 years ago

Also, I'm not sure I understand the rationale for skipping the containerization step. Are problems with the build that common (and that difficult to fix)?

This is purely anecdotal, but as a more "casual" contributor to bioconda my experience in some individual cases has been that writing a recipe that builds correctly has become harder and takes more work. I can imagine that I would rather have a working bioconda package but no container, as compared to a build that I can't get to work. However, these are probably the exceptions, and this is not in any way based on hard data.

(And definitely not not intended as criticism... Documentation is consistently improving and I see the big benefits of automatic containerization.)

bgruening commented 7 years ago

Thanks a bunch for your comments so far!

@sebastian-luna-valero we are in contact with the singularity devs and exchanging ideas about Singularity hub and the Singularity registry. Currently the situation is not entirely clear, but luckily storing images is not a big deal here, it's just files and we are storing them on a public FTP server. This can be indexed by Singularity hub or mirrored by Cyverse or we can create a torrent for these images. I have a few ideas in this regard just need time :)

@sebastian-luna-valero @boegel we are bootstrapping from busybox, the exact same image that we use for our Docker/rkt images.

@marcelm @johanneskoester this is perfectly possible and Galaxy for example is doing this if there is not native Singularity image available. However, for storing, replicating, sharing Singularity images ... native once seems to be preferred currently. Also you have currently a build-time overhead and afaik you need to have sudo rights.

@simonvh thanks so much for this comment. I totally agree and this is also part of my small talk at BOSC next week. I do completely agree that submitting packages to Bioconda is more complicated than a year ago. I doubt this is due to the Container building stuff, imho the entire community did a great job at improving the quality of the recipes. Just remember the times back when zlib was not required or we were not pinning anything. The Container builds helps in this regard, as it improved our testing dramatically - but this also means more failing tests - more complicated submissions. Also the join with conda-forge did add a some complexity ... I still do think it's worth it. So in some case Bioconda is grown-up, nearly 2 years now ;), and we do a much better job than we did 1 year ago. But this comes at it's price. I don't think it is due to the Container builds.

@chambm @lparsons I thought a lot about this and maybe we should add a opt-out option. I'm really hesitate about this, because we could raise the trust in our community quite a bit if we guarantee a Singularity image for every Conda packages, as much as we do for Docker. In the end it's a community decision and Bioconda needs to decide how much burden it is to wait a few days longer to fix things and make the Biocontainers community happy as well. I think we have a unique opportunity here that we can have one central community maintained repository to build all the things with a minimal overhead. Currently, I second very much what @johnfonner said.

lparsons commented 7 years ago

@bgruening I totally understand the argument of ensuring a Singularity image for every Conda package. My concern is more during early days of Singularity support. I wouldn't want testing of the Singularity container build process to hold up production versions of packages. However, I defer to your considerable experience in this matter if you think the added overhead and risk is minimal/worth it.

chapmanb commented 7 years ago

My motivation for suggesting an opt-out is to give people a chance to build bioconda packages if they fail mulled/Singularity builds for some reason, and then asynchronously come back and later fix the mulled issues. It's ideal if we can build Docker/Singularity for everything, but also ends up putting a lot on Björn since he's one of the few people that can effectively debug/fix problems. Having a temporary opt-out lets him not need to be always present in real time to fix issues. This is really useful when you're trying to push a bioconda package to provide a fix and end up blocked on something you don't understand with mulled.

marcelm commented 7 years ago

@bgruening I’ll continue with my black Thinking Hat on, so please don’t take my comments personally.

However, for storing, replicating, sharing Singularity images ... native once seems to be preferred currently.

But by whom and why? What is actually better? If (and I don’t know whether that is the case) a Singularity image were nothing more than a Docker image in a different file format, why spend the extra person-hours, CPU time and storage to provide both? The only difference would be that one could write singularity pull shub://biocontainers/sometool instead of singularity pull docker://biocontainers/sometool in the instructions.

Also you have currently a build-time overhead

I may be doing this the wrong way, but after clearing caches, singularity pull docker://biocontainers/samtools took 1m 22s on my machine, while docker pull biocontainers/samtools needed 1m 35s. Most of that time was spent on downloading the layers.

and afaik you need to have sudo rights.

Not anymore, see http://singularity.lbl.gov/ :

As of version 2.3, you can even import Docker image content without sudo permissions.

Again, I don’t want to (or could) prevent you from doing this. I just want to learn why we need two different formats.

bgruening commented 7 years ago

@marcelm a singularity image is not OCI approved as far as I know and is a complete different thing. You can repack on into the other and this is what we do here.

It is not as simple as you might think. For example try this one:

singularity pull docker://quay.io/biocontainers/samtools:1.5--0

Even if it would be that easy (we are working on the size thingy to improve this), there seems to be a need to have Singularity images, otherwise the hub and the registry would not be developed right? Also consider use-cases where HPC environments do not have internet access (not so uncommon) or parts of Afrika where dockerhub is blocked. What about mirroring images locally, sharing singularity images over NFS ... keep in mind that Docker images are based on layers and is usually not a flat file. An other use-case would be automatic systems that don't can pull or assume the images in a certain path ... I guess there a several other use-cases where this would make sense that are just a few people told me.

Your example shows nicely how fast it is and we don't even need to download the image again, so the impact to Bioconda should be minimal. I currently test also the Singularity image with the same test as the Docker containers and the Conda recipes, if this is is a concern, we can skip this.

@chapmanb from you comments I conclude that Bioconda as project should care about Conda packages and the Conda build-infrastructure, but the Singulairty one should not infer here and should be maintained by individuals. If this is correct, I will implement on optional opt-out and start working on these pieces.

dtrudg commented 7 years ago

Just wanted to mention there is a fair bit of interest in this here at the PEARC17 HPC conference. I mentioned Biocontainers in the VM curation BoF session, and Dan Stanzione from TACC brought up @johnfonner 's work in response to a question from the audience in his Stampede 2 talk. I've spoken to a few of people about how at UTSW I've been trying out building singularity images, much like John is at TACC.

Regarding @marcelm 's comments - we can't really use native biocontainers on our system run directly with singularity and a biocontainer URI. We aren't able to use overlay mounting right now, so we have to create our (non-standard) home dir, shared fs mount points in a singularity image for it to be useful to users. Also, we have some crazy proxy/firewall stuff to deal with (no direct internet access) which requires injecting things into some specific containers to get tools to work reliably for users. Right now I'm piloting stuff on our cluster in a similar way to @johnfonner I think.

A central repository of singularity images would be fantastic. Lightweight injection of our customization into an existing singularity image would be a lot faster than doing the docker -> singularity bootstrap process.

I was lucky enough to speak to Vanessa of singularity fame here also. Sounds like the registry/hub will be quite useful and flexible.

johanneskoester commented 7 years ago

@bgruening: in contrast to the mulled-docker builds, an additional singularity build won't further increase the robustness of our packages. Hence, I am fine with having an out-switch in the recipe. Further, they could also be built asynchronously (e.g. with a daily job on biocontainers), right?

kyleabeauchamp commented 7 years ago

FWIW, I made a half-hearted attempt to use the existing bioconda docker images with the bioconda singularity conda package:

conda create -n single -c conda-forge -c bioconda singularity python=3.6

source activate single

singularity pull docker://quay.io/biocontainers/samtools:1.5--0
# Fails due to size issue

singularity pull --size 4096 docker://quay.io/biocontainers/samtools:1.5--0
# Fails due to some permissions issues, suggests sudo

However, my naive approach hit a wall with this issue (https://github.com/singularityware/singularity/issues/749).

It would definitely be interesting if this (or a similar) workflow worked out of the box.

truatpasteurdotfr commented 7 years ago

singularity needs some root suid executables, which anaconda does not provide.

[tru@elitebook840g3 ~]$ conda create -n single -c conda-forge -c bioconda singularity python=3.6
Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /c7/shared/anaconda3/envs/single:

The following NEW packages will be INSTALLED:

    ca-certificates: 2017.4.17-0      conda-forge
    certifi:         2017.4.17-py36_0 conda-forge
    ncurses:         5.9-10           conda-forge
    openssl:         1.0.2l-0         conda-forge
    pip:             9.0.1-py36_0     conda-forge
    python:          3.6.1-3          conda-forge
    readline:        6.2-0            conda-forge
    setuptools:      33.1.1-py36_0    conda-forge
    singularity:     2.3-0            bioconda   
    sqlite:          3.13.0-1         conda-forge
    tk:              8.5.19-1         conda-forge
    wheel:           0.29.0-py36_0    conda-forge
    xz:              5.2.2-0          conda-forge
    zlib:            1.2.11-0         conda-forge

Proceed ([y]/n)? y

ca-certificate 100% |################################| Time: 0:00:00 180.65 kB/s
ncurses-5.9-10 100% |################################| Time: 0:00:05 213.97 kB/s
singularity-2. 100% |################################| Time: 0:00:04 339.66 kB/s
sqlite-3.13.0- 100% |################################| Time: 0:00:06 764.10 kB/s
tk-8.5.19-1.ta 100% |################################| Time: 0:00:01   1.13 MB/s
xz-5.2.2-0.tar 100% |################################| Time: 0:00:01 644.18 kB/s
zlib-1.2.11-0. 100% |################################| Time: 0:00:00 563.95 kB/s
openssl-1.0.2l 100% |################################| Time: 0:00:03   1.12 MB/s
readline-6.2-0 100% |################################| Time: 0:00:00   1.17 MB/s
python-3.6.1-3 100% |################################| Time: 0:00:08   2.23 MB/s
certifi-2017.4 100% |################################| Time: 0:00:00  11.42 MB/s
setuptools-33. 100% |################################| Time: 0:00:00   1.95 MB/s
wheel-0.29.0-p 100% |################################| Time: 0:00:00 973.21 kB/s
pip-9.0.1-py36 100% |################################| Time: 0:00:02 768.45 kB/s
...
# To activate this environment, use:
# > source activate single
#
# To deactivate an active environment, use:
# > source deactivate
#
[tru@elitebook840g3 ~]$ source activate single
(single) [tru@elitebook840g3 ~]$ cd /dev/shm/
(single) [tru@elitebook840g3 shm]$ type singularity
singularity is /c7/shared/anaconda3/envs/single/bin/singularity
(single) [tru@elitebook840g3 shm]$ singularity pull --size 4096 docker://quay.io/biocontainers/samtools:1.5--0
ERROR: Image file exists, not overwriting.
(single) [tru@elitebook840g3 shm]$ \rm samtools-1.5--0.img 
(single) [tru@elitebook840g3 shm]$ singularity pull --size 4096 docker://quay.io/biocontainers/samtools:1.5--0
Initializing Singularity image subsystem
Opening image file: samtools-1.5--0.img
Creating 4096MiB image
Binding image to loop
ERROR  : The feature you are requesting requires privilege you do not have
ABORT  : Retval = 255
(single) [tru@elitebook840g3 shm]$ ls -ld /c7/shared/anaconda3/envs/single/libexec/singularity/bin/*suid
-rwxrwxr-x. 1 tru users 460149 Jul 22 11:46 /c7/shared/anaconda3/envs/single/libexec/singularity/bin/action-suid
-rwxrwxr-x. 1 tru users 430426 Jul 22 11:46 /c7/shared/anaconda3/envs/single/libexec/singularity/bin/copy-suid
-rwxrwxr-x. 1 tru users 239769 Jul 22 11:46 /c7/shared/anaconda3/envs/single/libexec/singularity/bin/create-suid
-rwxrwxr-x. 1 tru users 241849 Jul 22 11:46 /c7/shared/anaconda3/envs/single/libexec/singularity/bin/expand-suid
-rwxrwxr-x. 1 tru users 431036 Jul 22 11:46 /c7/shared/anaconda3/envs/single/libexec/singularity/bin/export-suid
-rwxrwxr-x. 1 tru users 431084 Jul 22 11:46 /c7/shared/anaconda3/envs/single/libexec/singularity/bin/import-suid
-rwxrwxr-x. 1 tru users 423514 Jul 22 11:46 /c7/shared/anaconda3/envs/single/libexec/singularity/bin/mount-suid

a regular rpm/source/.. singularity 2.3.1installation yields:

[tru@elitebook840g3 ~]$ rpm -qlv singularity |grep suid$
-rwsr-xr-x    1 root    root                   136640 Jun 26 19:14 /usr/libexec/singularity/bin/action-suid
-rwsr-xr-x    1 root    root                    78728 Jun 26 19:14 /usr/libexec/singularity/bin/create-suid
-rwsr-xr-x    1 root    root                    78736 Jun 26 19:14 /usr/libexec/singularity/bin/expand-suid
-rwsr-xr-x    1 root    root                   128360 Jun 26 19:14 /usr/libexec/singularity/bin/export-suid
-rwsr-xr-x    1 root    root                   128360 Jun 26 19:14 /usr/libexec/singularity/bin/import-suid
-rwsr-xr-x    1 root    root                   128352 Jun 26 19:14 /usr/libexec/singularity/bin/mount-suid
[tru@elitebook840g3 shm]$ type singularity
singularity is /usr/bin/singularity
[tru@elitebook840g3 shm]$ singularity pull --size 4096 docker://quay.io/biocontainers/samtools:1.5--0
Initializing Singularity image subsystem
Opening image file: samtools-1.5--0.img
Creating 4096MiB image
Binding image to loop
Creating file system within image
Image is done: samtools-1.5--0.img
Docker image path: quay.io/biocontainers/samtools:1.5--0
Cache folder set to /home/tru/.singularity/docker
Importing: base Singularity environment
Importing: /home/tru/.singularity/docker/sha256:a0d89ad06fe440e863e4170430afde104e1a6d366c0dd3111b785871f645b6a2.tar.gz
Importing: /home/tru/.singularity/docker/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4.tar.gz
Importing: /home/tru/.singularity/docker/sha256:aef3b3b2fa0d190f2a8ab3e43c7ce4e34ae8eb29d56a93c36c82740a30d4dac0.tar.gz
Importing: /home/tru/.singularity/docker/sha256:531ebc5af9ff52b42017cbbde280607e75570a05b66a60be8e12f9417b3fbad4.tar.gz
Importing: /home/tru/.singularity/docker/sha256:00f810677cffc160025dbf82081ebd25ca8f951257749120df70c23d04c10f1c.tar.gz
Importing: /home/tru/.singularity/docker/sha256:6c2ebb6634fc80ba5bfd31a6bed097d63883a78f1deb6c4fdb58aeb219e4ccba.tar.gz
Importing: /home/tru/.singularity/docker/sha256:d836c29a56fbb4798289a6d46d6726e0c52da024b834cb3ed14d80f8d5e4112a.tar.gz
Importing: /home/tru/.singularity/docker/sha256:a7f760de4b2725b11b38f8e0a36c1d8abd2e2128442e8dabbec6baa86e64173b.tar.gz
Importing: /home/tru/.singularity/docker/sha256:4c1fa756c345dec2e28659f9b7a6195bd1f68cab67e3e8db7edb782abcba4b57.tar.gz
Importing: /home/tru/.singularity/metadata/sha256:3988644f96d4a3069f35ad75fee0173c6fd9dba693dbb44cb0198cfd0d889f1d.tar.gz
Done. Container is at: samtools-1.5--0.img
[tru@elitebook840g3 shm]$ singularity exec ./samtools-1.5--0.img samtools --version
samtools 1.5
Using htslib 1.5
Copyright (C) 2017 Genome Research Ltd.
jerowe commented 7 years ago

I'm wondering if anyone on this thread has experience administering singularity for a large number of users. Can I have a central repository of singularity images available, with individual users able to take advantage of this without clobbering eachother?

boegel commented 7 years ago

@jerowe Singularity images are just files, so in theory you just need to park them somewhere on a shared filesystem.

There is working being done on something fancier though, see https://singularityhub.github.io/singularity-registry/

jerowe commented 7 years ago

In theory everything should work. Then I try it on a Lustre filesystem and wonder what is wrong with my life. ;-)

The registry looks cool. A goal for this year is to make singularity resources known and available.

dtrudg commented 7 years ago

@jerowe - Have some singularity images here on a lustre FS without issue, as they are read-only at run-time for users. Software in the images tends to run faster than native installs, since lustre is good at dealing with the large single image files that don't have metadata overhead for lustre.

The sregistry stuff by @vsoch is progressing quickly. She's very interested to hear from people who would make use of a local registry https://github.com/singularityhub/sregistry/issues

NB - you probably want to take a look at the singularityhub/sregistry repo, not singularity-registry

gmkurtzer commented 7 years ago

@jerowe - Issues with Singularity on Lustre? Inconceivable! ... It should work perfectly, but if you are not experiencing awesomeness, please post an issue on our Singularity GitHub with some debug output, and we'd be glad to take a look! :)

jerowe commented 7 years ago

Thanks for the responses! I will take a look at the registry.

@gmkurtzer - I haven't had any issues, but I like to find out about things before I break them. ;-)

gmkurtzer commented 7 years ago

@jerowe - Ohhh, gotcha! Let us know if you do find any issues!

Reminds me of one of my favorite sayings: In theory, there is no difference between practice and theory, but in practice there is.

jerowe commented 7 years ago

@bgruening - did this go through? Can we get singularity bioconda images?

bgruening commented 7 years ago

Not yet. I need to implement this in bioconda-utils. galaxy-lib has support and is working. Look at https://github.com/BioContainers/multi-package-containers and https://depot.galaxyproject.org/singularity/

Just need more time and currently busy with the r-* migrations :(

bgruening commented 7 years ago

That said when it is urgent, mulled-build will create your singularity images if needed today.

jerowe commented 7 years ago

Its not urgent for me atm, but it will be in the next few months. Thanks for all the great work!

vsoch commented 7 years ago

In case it's useful, the singularity registry is done: https://singularityhub.github.io/sregistry meaning an individual (or institution) can push (and others pull) to their own registry, for example here is a user's deployed with biocontainers: https://sregistry.randomroad.net/collections/3/ and then the registry itself gets consolidated at https://singularityhub.github.io/containers/ and the containers are available directly from singularity (eg, singularity pull shub://<registry>/<collection>/<container>:<tag>). It's fairly new so any and all suggestions / requests for features, please ask away! If you want to use with singularity you will need to use development branch.

dtrudg commented 7 years ago

The randomroad.net sregistry mentioned above is sitting on my home server, behind a cable connection so it may not be fast/stable, but should be kept fairly up-to-date with @vsoch's work on sregistry if you want to take a look at it.

Also note that the containers there are not built with mulled-build but by a script creating a def file and bootstrapping From: the quay.io biocontainer docker images. It's an approach used to add in some directories since the cluster I use doesn't support overlay bind mounting with singularity.

bgruening commented 6 years ago

Small update from my side. https://depot.galaxyproject.org/singularity All containers are automatically converted from the Docker ones and testes against Bioconda tests. We will work on this further and then push direct integration into the Bioconda build system.

charlesreid1 commented 6 years ago

bump

bgruening commented 6 years ago

How can I help @charlesreid1? The generation of Singularity containers is not part of the BioConda build system but I do run these every now and then. You can find them here: https://depot.galaxyproject.org/singularity

I will have soon a build-bot that does this automatically every 24h.

charlesreid1 commented 6 years ago

There's been no activity on this thread for six months, so I was wondering how things are going.

bgruening commented 5 years ago

@charlesreid1 things are going good. Have a look at https://depot.galaxyproject.org/singularity

We do have 32.000 Singularity containers stored there and are currently shifting everything to a CVMFS repo. I will update this thread as soon as its done.