BioContainers / containers

Bioinformatics containers
http://biocontainers.pro
Apache License 2.0
674 stars 246 forks source link

CADD-scripts container image including conda environments #528

Closed fa2k closed 9 months ago

fa2k commented 1 year ago

I would like to request a docker image for cadd-scripts v1.6 - Combined Annotation Dependent Depletion

Website: https://cadd.gs.washington.edu/ The GitHub repo has install instructions for the cadd tools: https://github.com/kircherlab/CADD-scripts

Description: The cadd-scripts includes some shell scripts and a snakemake pipeline. The pipeline automatically downloads a conda environment when it is executed. There is a bioconda package cadd-scripts, which works like this.

The conda package does not transfer well into container formats because it needs to download the Snakemake pipeline's conda env into \<cadd-scripts's conda-env>/share/cadd-scripts-1.6-1/envs when it is executed, before the actual processing starts (and container filesystems like that are normally read-only or temporary). I propose that we can instead pre-load the conda environment when building the docker image, following a documented recipe in the CADD GitHub repo.

One caveat is that the CADD annotation database is huge and will not be possible to bundle into the container image. So we still need to bind-mount the annotation database.

(edited to remove unnecessary text)