Closed genomaxx closed 4 years ago
See: Container Wiki Information
Let's use the LABEL tags
to inventory what software each container will have available within itself on top of what is imported from the base image.
LABEL tags="tar, wget, gzip, bzip2, which, sudo, python3, aws-cli, samtools"
LABEL tags="parallel, sratoolkit"
Where sratoolkit
is vdb-config
, prefetch
and fastq-dump
and fasterq-dump
. If possible the entire toolkit should be used.
LABEL tags="bowtie2"
This doesn't need any more software over the base image
Description of improvement:
A base container is made so that all the software can build - this requires things like gcc and development packages. This bloats the final container sizes because of all the different steps. It would be more efficient to move all the building of software to a 'builder' container and only take the final output software and any critical software into the production containers.
Currently each container is around 2GB and it is deployed 1000+ times per cluster. Reducing the size down to 100 or 200MB would have a significant effect.
This is called "Multi-Stage Builds" for container optimization.
Apply Multi-stage Builds to the Serratus container hierarchy.