common-workflow-library / bio-cwl-tools

CWL CommandLineTool descriptions for biology/life-sciences related applications
https://github.com/common-workflow-library/bio-cwl-tools#readme
Apache License 2.0
76 stars 37 forks source link

Must there be a public software container? #7

Open mr-c opened 5 years ago

mr-c commented 5 years ago

Do we require the Dockerfile be present here as well?

michael-kotliar commented 5 years ago

I think it's preferable to have Dockerfile included in the repository. Alternatively, it should be mentioned from where we can get this Dockerfile in order to make some changes and rebuild the image ourselves. For example I want to use RSEM tool with STAR mapper . The Docker image (https://cloud.docker.com/u/biowardrobe2/repository/docker/biowardrobe2/rsem/tags) includes three mappers: STAR, Bowtie, Bowtie2. I don't need all of them. I might want to rebuild this image with only STAR installed

michael-kotliar commented 5 years ago

The software version in the Dockerfile should be hardcoded. I would also add some metadata to the Dockerfile (see example below).

#################################################################
# Dockerfile
#
# Software:         FastQC
# Version:          v0.0.1
# Description:      Tool to spot potential problems in high througput sequencing datasets 
# Website:          http://www.bioinformatics.babraham.ac.uk/projects/fastqc
# Provides:         FastQC 0.11.8
# Base Image:       ubuntu:18.04
# Build Cmd:        docker build --rm -t cwlhub/fastqc:v0.0.1 .
# Pull Cmd:         docker pull cwlhub/fastqc:v0.0.1
# Run Cmd:          docker run --rm -ti cwlhub/fastqc:v0.0.1 fastqc --version
#################################################################

The Version field is the version of the docker image (not the version of the installed program). All installed programs with their versions should be mentioned in the Provides field.

Additionally, I would be nice to use well known base images and install all required dependencies for each image separately, avoiding inheritance from the docker images of other tools. Otherwise it will cause a lot of "pain" while updating some R packages, for example :)

michael-kotliar commented 5 years ago

Also, Dockerfile and all custom scripts, that should be copied to the image, should be placed in a separate folder. Docker uses the current directory as the context for building images, so the smaller it is the "faster" it builds.

tfmorris commented 5 years ago

Biocontainers already has the Dockerfile, so it seems redundant to include an extra copy somewhere else. I like the idea of having the CWL and Dockerfile together though. Perhaps include the CWL alongside the Dockerfile? e.g. https://github.com/BioContainers/containers/blob/master/bamtools/2.4.0/Dockerfile

michael-kotliar commented 5 years ago

Then, I think, it's better to keep Dockerfile alongside CWL file only in case it was really necessary to use some custom container.

medcelerate commented 4 years ago

Would it be wise to just push the container to the biocontainers repo if something custom is needed?