BioContainers / containers

Bioinformatics containers
http://biocontainers.pro
Apache License 2.0
674 stars 246 forks source link

Add stardist 0.8.5 recipe to containers #551

Open FloWuenne opened 9 months ago

FloWuenne commented 9 months ago

Submitting a Container

(If you're requesting for a new container, please check the procedure described here.

Check BioContainers' Dockerfile specifications

Checklist

  1. Misc

    • [x] My tool doesn't exist in BioConda
    • [x] The image can be built
  2. Metadata

    • [x] LABEL base_image
    • [x] LABEL version
    • [x] LABEL software.version
    • [x] LABEL about.summary
    • [x] LABEL about.home
    • [x] LABEL about.license
    • [x] MAINTAINER
  3. Extra (optionals)

    • [ ] I have written tests in test-cmds.txt
    • [ ] LABEL extra.identifier
    • [ ] LABEL about.documentation
    • [x] LABEL about.license_file
    • [ ] LABEL about.tags

    Check BioContainers' Dockerfile metadata

biocontainers-bot commented 9 months ago

No biotools label defined, please check if tool is not already defined in biotools (https://bio.tools) and add extra.identifiers.biotools label if it exists. If it is not defined, you can ignore this comment.

biocontainers-bot commented 9 months ago

No test-cmds.txt (test file) present, skipping tests

uschmidt83 commented 9 months ago

Hi, StarDist author here. I'm not really familiar with Bioconda+BioContainers, hence I'm wondering what the StarDist container is supposed to be used for in the Bioconda+BioContainers ecosystem.

The context for the Dockerfile in our repo (which the Dockerfile here is based on) was to offer an easy way to get a GPU-accelerated version of StarDist running, without the hassle of installing CUDA etc. Using StarDist is meant to be primarily done via interactive Jupyter notebooks (hence the TensorFlow jupyter-variant base image).

StarDist itself doesn't require GPU-acceleration, but can take advantage of it for improved performance (by enabling GPU-support for TensorFlow via CUDA, and also by installing gputools to take advantage of OpenCL).

FloWuenne commented 9 months ago

Thanks for that insight @uschmidt83 !

The main purpose of having a stardist container in biocontainers (or any hosted hub for docker for that matter) from our side is that we want to implement it in nf-core based nextflow pipelines. These pipelines require Docker containers for reproducible execution. Since stardist is part of conda-forge, a bioconda package (which automatically builds biocontainers) obviously can't be created and so a biocontainers recipe was the next logical step from my side. The alternative would be a CI logic on your repo that pushes to a publicly available dockerhub.

For starting I used your github repo Dockerfile and adjusted it for the biocontainer requirements. To be fair, I am not that faimiliar with GPU specifications in container settings and didn't realize, that the NVIDIA driver specifications could be issues for cross-platform and singularity compatibility until @mboudet mentioned it.

Would you be able and willing to make suggestions to this recipe to remove GPU acceleration and requirements? Again, the main drive of this is to have a docker container hosted in the public domain to use in nf-core pipelines. If you prefer to have a CI on your repo push to your dockerhub and we can use that, then we can also close this PR, whatever option you like better 😊!

Thanks for your help and insights!

uschmidt83 commented 9 months ago

The main purpose of having a stardist container in biocontainers (or any hosted hub for docker for that matter) from our side is that we want to implement it in nf-core based nextflow pipelines. These pipelines require Docker containers for reproducible execution.

If I understand this correctly, these Nextflow pipelines are not meant to run interactively. Instead each "step" of the pipeline consists of software that can either be installed via conda or as run as a container.

If that assumption is true, wouldn't the StarDist container then not need a defined entrypoint to run the the software with defined inputs and outputs?

Since stardist is part of conda-forge, a bioconda package (which automatically builds biocontainers) obviously can't be created and so a biocontainers recipe was the next logical step from my side.

I don't understand this part, and also don't know the relationship between conda-forge and bioconda.

For starting I used your github repo Dockerfile and adjusted it for the biocontainer requirements. To be fair, I am not that faimiliar with GPU specifications in container settings and didn't realize, that the NVIDIA driver specifications could be issues for cross-platform and singularity compatibility until @mboudet mentioned it.

Would you be able and willing to make suggestions to this recipe to remove GPU acceleration and requirements?

Removing the GPU-related things from the Dockerfile is actually easy. Again, I'm more concerned with how the Docker container is supposed to be used in the context of Nextflow.

Again, the main drive of this is to have a docker container hosted in the public domain to use in nf-core pipelines. If you prefer to have a CI on your repo push to your dockerhub and we can use that, then we can also close this PR, whatever option you like better 😊!

I don't have enough information to know which option would be better for us. It looks like the container definition needs to be updated each time we make a new StarDist release. How does that process work in BioContainers?

Best, Uwe

FloWuenne commented 9 months ago

Thank you for the details again @uschmidt83 ! Sorry if my answer was more confusing then anything. To address your questions:

1) Your first assumption about nextflow is absolutely right. Nextflow aims to provide reproducible pipelines by running specific tasks from within conda environments or docker/singularity (or other container environments). Usually, tools are run via CLI interfaces from within the containers and thus the container itself doesn't need an entrypoint. Within nextflow pipelines, these CLI interfaces can be given by scripts provided with the nextflow code, rather than necessarily be part of the tool / container. That being said, having a CLI interface for inference using stardist wouldn't be a bad idea in my opinion. migueLib already has some code for doing exactly this. Could maybe be an option to open a PR and add a CLI interface...

2) The whole part about bioconda vs conda-forge was basically just me saying, that bioconda builds docker containers automatically on quay.io / biocontainers (and conda-forge doesnt). But since stardist is already on conda-forge, one can't create a bioconda recipe (because namespace is shared between the two channels).

3) Regarding updating the container, the easiest from a maintenance standpoint I guess would be CI on the stardist repo, that automatically builds the container on each new release. I believe biocontainers would automatically make a PR to bump the version (via dependabot) everytime there would be a new pypi release. Ultimately again, I would leave this up to you and could help with the CI setup if required.

Please let me know what you think of these ideas and whether any details are still unclear!

mboudet commented 9 months ago

I believe biocontainers would automatically make a PR to bump the version (via dependabot) everytime there would be a new pypi release.

I don't believe there is a dependabot setup on the biocontainer repository. Seems like all PR are manual. (There is a bioconda-bot setup on the bioconda repository, which makes PR based on release on github).

So it might be easier to setup a CI on stardist side to push on docker/github container hub (and update the conda package) after a release (if that's needed)

FloWuenne commented 9 months ago

Thanks @mboudet ! I just saw a PR with dependabot and thought that would also check version bumps of main packages within containers...

In that case, a CI directly on the stardist repo that builds a non-GPU container and pushes to dockerhub might be the easiest @uschmidt83.

uschmidt83 commented 9 months ago

Thanks for the clarification, @FloWuenne and @mboudet.

That being said, having a CLI interface for inference using stardist wouldn't be a bad idea in my opinion. migueLib already has some code for doing exactly this. Could maybe be an option to open a PR and add a CLI interface...

We do have CLI for 2D and 3D model inference, but not for model training. Please take a look and let me know if you think this is sufficient.

So it might be easier to setup a CI on stardist side to push on docker/github container hub (and update the conda package) after a release (if that's needed)

Not sure about this, I have to look into this and understand the pros and cons.

FloWuenne commented 8 months ago

Thanks for the links for the inference CLIs @uschmidt83 !

The inference scripts look exactly what would be needed to run inference from within a container. So I think the easiest would be if we would update the Dockerfile in your main stardist repository to be one that doesn't use GPU (if that's something you want to support. The existing dockerfile could be saved as a DOCKERFILE_GPU or similar).

Once the main DOCKERFILE in the repo is updated, here is an example of a Github CI that would build the container and push it to Dockerhub on release and every night (you would have to set the Dockerhub name and secret in the repository). Obviously for stardist one could set the CI to only trigger on release or merge on main (depending on your development structure!)

Let us know what your preference is! We can also still go ahead and change this recipe here to work without GPU and have this alongside your main repository. That way, you don't have to change the Dockerfile and deal with setting up the CI... Ultimately I would leave the decision to you as the package creator!

FloWuenne commented 5 months ago

@uschmidt83 Just a quick ping here to ask whether you wanted to go through with biocontainer or rather go with Docker CIs in your repo? If you want to go for the CI action then I would close this PR.