Closed rpetit3 closed 3 years ago
I prefer to keep tools separate for when they update. Although, it's not really possible in every case.
For example, ivar needs samtools to run. So every time samtools updates, there's more samtools containers and ivar containers.
The artic containers also have multiple tools in them.
I just responded to a similar question on the StaPH-B toolkit. Generally it is a best practice to keep docker images small and lightweight and specific to the task or process. Also small individual containers are more Cloud friendly compared to a large image as Cloud workflows and HPC systems often run jobs in an ephemeral approach where jobs are split across multiple instances that are often removed or discarded after use. In this case the image is downloaded for each instance and keeping the images small and light and specific to their individual tasks speeds up execution time since the process won't need to download a large image.
Also it is very difficult to maintain an image with multiple tools or programs as anytime a version change occurs you deal with how to track those changes. This is a big issue with the Pangolin image right now since it now has a Pangolearn, Scorpio, and Usher version to worry about.
Totally understand the position.
My use case is a bit of an outlier, considering Bactopia includes more than 70 bioinformatic tools (https://bactopia.github.io/acknowledgements/#software-included-in-bactopia). Haha so I hope you can understand opting to go the multiple programs in a container route versus 70+ containers (e.g. 70+ Nextflow processes).
Typically with Bactopia I save program version updates for minor version updates (1.4.0 -> 1.5.0). Although there are times a patch is needed because v0.0.2 of program X had a bug, but those are usually rare.
Given this use case, I wonder is there an alternative like a "StaPH-B" stamp of approval instead.
I also prefer to keep containers small/lean where we can, but totally understand the use case here. I'm OK with multi-package containers as long as you don't mind maintaining them 😄
I think biocontainers has something called "mulled containers" which mashes together multiple tools found in bioconda into a single "mulled" container.
We could somehow tag them with bactopia
or perhaps keep the dockerfiles within a directory called /bactopia
and you could control the versioning however you'd like. One idea might be something like this:
bactopia/
├── annotate_genome
│  └── 0.0.1
│  ├── Dockerfile
│  └── environment.yml
└── antimicrobial_resistance
└── 0.0.1
├── Dockerfile
└── environment.yml
where the yml file could live alongside the Dockerfile and incorporated with a COPY
in your Dockerfiles (may need some tweaking)
We typically keep the structure of /tool-name/0.0.1/Dockerfile
for a given tool, but it shouldn't matter if there's another directory in there. Just need to be mindful of this if ever used a GitHub Actions workflow for testing, auto-image-building/pushing, etc.
Thank you everyone for you input!
I'm going to start things on my end, I think once I submit a PR we can get an idea on what to expect.
Yo! I'm going to close this for now. I'm a bit distracted with v2 of Bactopia, once v2 is out I will revisit this!
Sometimes in workflows (Nextflow, Cromwell, etc...) I find it it's nicer to just include multiple packages in a single container to avoid have numerous steps and the pulling/starting/stopping of containers overhead.
I'm curious if there are plans to start allowing multiple packages in a single container. I think it would have to be a case-by-case thing.
My reason for this issue, is because I would like to make StaPH-B docker images for Bactopia. This would require ~12 multi-package docker containers (https://github.com/bactopia/bactopia/tree/master/containers).
I'm happy to start a PR with a Bactopia example.