StaPH-B / docker-builds

:package: :whale: Dockerfiles and documentation on tools for public health bioinformatics
GNU General Public License v3.0
184 stars 117 forks source link

Are multi-package containers accepted? #214

Closed rpetit3 closed 3 years ago

rpetit3 commented 3 years ago

Sometimes in workflows (Nextflow, Cromwell, etc...) I find it it's nicer to just include multiple packages in a single container to avoid have numerous steps and the pulling/starting/stopping of containers overhead.

I'm curious if there are plans to start allowing multiple packages in a single container. I think it would have to be a case-by-case thing.

My reason for this issue, is because I would like to make StaPH-B docker images for Bactopia. This would require ~12 multi-package docker containers (https://github.com/bactopia/bactopia/tree/master/containers).

I'm happy to start a PR with a Bactopia example.

erinyoung commented 3 years ago

I prefer to keep tools separate for when they update. Although, it's not really possible in every case.

For example, ivar needs samtools to run. So every time samtools updates, there's more samtools containers and ivar containers.

The artic containers also have multiple tools in them.

k-florek commented 3 years ago

I just responded to a similar question on the StaPH-B toolkit. Generally it is a best practice to keep docker images small and lightweight and specific to the task or process. Also small individual containers are more Cloud friendly compared to a large image as Cloud workflows and HPC systems often run jobs in an ephemeral approach where jobs are split across multiple instances that are often removed or discarded after use. In this case the image is downloaded for each instance and keeping the images small and light and specific to their individual tasks speeds up execution time since the process won't need to download a large image.

Also it is very difficult to maintain an image with multiple tools or programs as anytime a version change occurs you deal with how to track those changes. This is a big issue with the Pangolin image right now since it now has a Pangolearn, Scorpio, and Usher version to worry about.

rpetit3 commented 3 years ago

Totally understand the position.

My use case is a bit of an outlier, considering Bactopia includes more than 70 bioinformatic tools (https://bactopia.github.io/acknowledgements/#software-included-in-bactopia). Haha so I hope you can understand opting to go the multiple programs in a container route versus 70+ containers (e.g. 70+ Nextflow processes).

Typically with Bactopia I save program version updates for minor version updates (1.4.0 -> 1.5.0). Although there are times a patch is needed because v0.0.2 of program X had a bug, but those are usually rare.

Given this use case, I wonder is there an alternative like a "StaPH-B" stamp of approval instead.

kapsakcj commented 3 years ago

I also prefer to keep containers small/lean where we can, but totally understand the use case here. I'm OK with multi-package containers as long as you don't mind maintaining them 😄

I think biocontainers has something called "mulled containers" which mashes together multiple tools found in bioconda into a single "mulled" container.

We could somehow tag them with bactopia or perhaps keep the dockerfiles within a directory called /bactopia and you could control the versioning however you'd like. One idea might be something like this:

bactopia/
├── annotate_genome
│   └── 0.0.1
│       ├── Dockerfile
│       └── environment.yml
└── antimicrobial_resistance
    └── 0.0.1
        ├── Dockerfile
        └── environment.yml

where the yml file could live alongside the Dockerfile and incorporated with a COPY in your Dockerfiles (may need some tweaking)

We typically keep the structure of /tool-name/0.0.1/Dockerfile for a given tool, but it shouldn't matter if there's another directory in there. Just need to be mindful of this if ever used a GitHub Actions workflow for testing, auto-image-building/pushing, etc.

rpetit3 commented 3 years ago

Thank you everyone for you input!

I'm going to start things on my end, I think once I submit a PR we can get an idea on what to expect.

rpetit3 commented 3 years ago

Yo! I'm going to close this for now. I'm a bit distracted with v2 of Bactopia, once v2 is out I will revisit this!