ababaian / serratus

Ultra-deep search for novel viruses
http://serratus.io
GNU General Public License v3.0
253 stars 33 forks source link

Docker Optimization #108

Closed genomaxx closed 4 years ago

genomaxx commented 4 years ago

Description of improvement:

A base container is made so that all the software can build - this requires things like gcc and development packages. This bloats the final container sizes because of all the different steps. It would be more efficient to move all the building of software to a 'builder' container and only take the final output software and any critical software into the production containers.

Currently each container is around 2GB and it is deployed 1000+ times per cluster. Reducing the size down to 100 or 200MB would have a significant effect.

This is called "Multi-Stage Builds" for container optimization.

Apply Multi-stage Builds to the Serratus container hierarchy.

ababaian commented 4 years ago

See: Container Wiki Information

Let's use the LABEL tags to inventory what software each container will have available within itself on top of what is imported from the base image.

serratus-base

LABEL tags="tar, wget, gzip, bzip2, which, sudo, python3, aws-cli, samtools"

serratus-dl

LABEL tags="parallel, sratoolkit"

Where sratoolkit is vdb-config, prefetch and fastq-dump and fasterq-dump. If possible the entire toolkit should be used.

serratus-align

LABEL tags="bowtie2"

serratus-merge

This doesn't need any more software over the base image