PreibischLab / RS-FISH

Tool for precise, interactive, fast and scalable FISH spot detection
GNU General Public License v2.0
45 stars 14 forks source link

Docker container for rs-fish command line #9

Closed FloWuenne closed 1 year ago

FloWuenne commented 2 years ago

Hi RS-FISH devs,

I wanted to create a Docker container for the command line version of rs-fish that runs on .ome.tiff files and doesn't require .n5 files like the spark containerized version. I had a couple of attempts usingmaru, but ultimately, unfortunately I wasn't able to create a working container using it.

So I started clean from a maven image to create the container. One issue that I was running into, is that in the rs-fish script, the java memory gets defined as Xmx0g. I currently went around this by using sed and replacing it with some value that is not zero, which seems to fix the problem. Any reason why this is defined as zero by default?

I would love to know, whether there is another docker build for the non-spark command-line version of the tool somewhere already?

Otherwise, here is the Dockerfile recipe for anyone that wants to make one, to use rs-fish:

FROM maven:3.8.4-jdk-8-slim
RUN apt-get -y update
RUN apt-get -y install git
RUN git clone https://github.com/PreibischLab/RS-FISH
WORKDIR /RS-FISH
RUN ./install
RUN sed -i "s|Xmx0g|Xmx8g|g" /RS-FISH/rs-fish

If you build your container using docker and the name 'rs_fish', the command can then be run with: docker run rs_fish /RS-FISH/rs-fish

I also put the docker image up on Dockerhub to make it easier to simply run the tool:

docker pull wuennemannflorian/rs_fish:2.3.1
docker run  wuennemannflorian/rs_fish:2.3.1 /RS-FISH/rs-fish
StephanPreibisch commented 2 years ago

Hi, the memory is computed relative to the available system memory. My best guess is that during creation of the container the memory is reported incorrectly (maybe by Docker?) for the install.sh script.

I think @krokicki already built a container for RS-FISH. Konrad, did you run into that as well?

Thanks so much, Stephan

krokicki commented 2 years ago

The container I built was for the Spark version. It's available here: https://github.com/JaneliaSciComp/multifish/tree/master/containers/rs_fish But it's not really relevant, because the memory for the Spark cluster is managed somewhere else.

Stephan, I think you're right about the memory computation having some issue inside the container, but I'm not sure why it didn't work for @FloWuenne. I tried it on both Linux and Mac and it produces the correct value for me. But it's a value that you probably don't want anyway because it's specific to the system you're building on and the container should work anywhere. I think the sed solution is very nice in its simplicity.

I don't think anyone else has built a Docker version of the container, @FloWuenne. How about submitting a pull request to add a section to the README about it? I'll follow your lead and add docs for the Spark-based container in the same way.

FloWuenne commented 2 years ago

Sounds like a great idea @krokicki !

Would be happy to make a pull request for added README documentation for the docker container. I might rebuild the docker container with a much higher max memory, maybe like Xmx64g, so that even larger images should be able to be processed with the container? What do you think about this?

I will likely do this during the next week!

FloWuenne commented 2 years ago

@krokicki and @StephanPreibisch

I am having an issue trying to run the DOCKER container created by me as a singularity container on our cluster.

I converted the docker image to singularity using singularity pull from my dockerhub: singularity pull rs_fish.2_3_1.sif docker://wuennemannflorian/rs_fish:2.3.1

When I try to run singularity exec, I keep getting the following error: Error: Could not find or load main class cmd.RadialSymmetry

Any idea what I would have to change during the docker build or when running singularity?

krokicki commented 2 years ago

@FloWuenne, if you look inside the container (singularity shell rs_fish.2_3_1.sif), here's what your containerized script looks like:

#!/bin/bash

JAR=$HOME/.m2/repository/net/preibisch/Radial_SymmetryLocalization/2.3.1-SNAPSHOT/Radial_SymmetryLocalization-2.3.1
-SNAPSHOT.jar
java \
  -Xmx8g \
  -XX:+UseConcMarkSweepGC \
  -cp $JAR:/root/.m2/repository/mpicbg/mpicbg/1.4.1/mpicbg-1.4.1.jar:...

The problem here is that all of the jars were installed in /root. This works in Docker because the Docker container happens to be running as root, but Singularity doesn't allow you to run as root (this is a feature not a bug :wink:)

The quickest way around this would probably be to set up a user inside the container, and run the install as that user, and run the Singularity container as that user. This makes the container more difficult to use though, and it's not something I would personally recommend.

I believe the "best practice" here is to never do anything that uses any user's home directory inside a container. To accomplish that you might need to modify the install script so that it installs the jars into a repository in a non-user location (e.g. /RS-FISH/repository), and then point the executables there.

ArtemSokolov commented 2 years ago

Dear all,

I just wanted to follow up on this. I was able to get the container working for both Docker and Singularity with the following Dockerfile:

FROM maven:3.8.4-jdk-8-slim

RUN apt-get -y update
RUN apt-get -y install git

RUN echo '<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0" \n\
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" \n\
  xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 \n\
                      https://maven.apache.org/xsd/settings-1.0.0.xsd"> \n\
  <localRepository>/RS-FISH/.m2/repository</localRepository> \n\
</settings>' > $MAVEN_HOME/conf/settings.xml

RUN git clone https://github.com/PreibischLab/RS-FISH
WORKDIR /RS-FISH

RUN sed -i 's|\\\$|\$|g' install
RUN HOME=/RS-FISH ./install

The trick to getting this to work was two-fold:

  1. Having Maven install packages to /RS-FISH/.m2 instead of /root/.m2. This is handled by the introducing a settings.xml to $MAVEN_HOME.
  2. Un-escaping \$ in install. As @krokicki alluded to, the problem for Singularity was having $HOME in the executable scripts produced by install. By un-escaping \$, the variable $HOME is instead evaluated during installation and the resulting executables will contain actual full paths that are fixed across all execution scenarios.

It's a slightly "hacky" approach. I think a better solution would be to modify install at the source, so that it doesn't introduce a dependency on $HOME to the executables, which would then substantially simplify the Dockerfile. Interestingly, I also did not have the memory issues mentioned by @FloWuenne in the original post, so I suspect that install may not be correctly detecting the amount of RAM available in his execution environment. As @krokicki said, containers should work anywhere, so I worry about having a fixed Xmx* value hardcoded inside executables, instead of allowing users to control it via the standard JAVA_TOOL_OPTIONS route.

Using the above Dockerfile, you can build container images with:

docker build -t rs-fish:test .
singularity build rs-fish-test.sif docker-daemon://rs-fish:test

and test them with:

docker run --rm rs-fish:test /RS-FISH/rs-fish
singularity exec -C rs-fish-test.sif /RS-FISH/rs-fish

I hope the above Dockerfile is helpful. Our group is quite interested in adding RS-FISH to our image processing pipeline MCMICRO (https://mcmicro.org/), so I would be happy to contribute to getting the Docker container off the ground.

FloWuenne commented 1 year ago

Sorry for the extremely long delay in adding the dockerfile @krokicki @StephanPreibisch !

I submitted a PR that contains the above Dockerfile from @ArtemSokolov #17 . I also added some documentation on how to build the dockerfile from the repository to the 'README'. Finally, I did add a github action that would automatically build and push the docker container to Dockerhub. You simply need to add your Dockerhub credentials to Github and have them be used by this repository. This would make it nicely automatic to always use the newest version of RS-FISH in other pipelines by using the Dockerhub container.

Please test the Docker container build and fix anything that is not working on your part! Hope this helps make it easier to use RS-FISH on different systems and environments!