Open stebo85 opened 8 months ago
Hi @stebo85 , I work at Sydney Microscopy & Microanalysis, at the University of Sydney. I suspect this was requested by me via Ryan Sullivan. Can I be the external volunteer to implement this? I would start with implementing RELION.
I read that we need to submit an issue to request access to the interactive tool? Or is it better to use the manual method? https://www.neurodesk.org/developers/new_tools/interactive_build/
Thanks
Dear @vennand,
It would be wonderful to have your help on this!!!
I just added you to the access group, so you can login to https://labtokyo.neurodesk.org/ - Try out the interactive build process and let us know if this works for you. Once you are more familiar with how we build containers, the manual process might be more efficient.
We are still working on improving how people can contribute new containers to our repository - so it would be wonderful to hear where things don't make sense yet and how the process could be improved!
Thank you so much Steffen
Hi @stebo85,
Sorry for taking so long to reply. I tried the interactive build process, but it didn't let me choose a GPU node, so I went with the manual process.
I'm a bit stuck at the point of building relion, because I need to choose a CUDA toolkit version, and I also need to specify the GPU compute capabilities (assuming it's NVIDIA).
Is there a way to get those dynamically from the container before building? Because this would depend on the hardware of the system.
Thanks, André
Dear @vennand - GPUs are tricky. The way we have been successful so far is to package the latest CUDA toolkit version in the container that works with the software. At runtime, the GPU driver gets mounted into the container, but so far, we saw that various CUDA versions work well on different Nvidia GPU driver versions. It would need a different container for AMD/Intel GPUs I guess, we haven't tested that yet.
Here is an example container where we install CUDA through conda: https://github.com/NeuroDesk/neurocontainers/blob/2f72d058e2a9cb49234792f340fe9a53145d8ac6/recipes/deepretinotopy/build.sh#L21
This container works on various Nvidia GPUs we had access to
Hi @stebo85, I've created this container for relion: https://github.com/vennand/neurocontainers/tree/master/recipes/relion
The Cuda Toolkit seems to work, though I haven't tested with a dataset yet. However, it doesn't seem like the "toolVersion" was replaced correctly in the README.md. I'm not sure if I've done something wrong. When exiting the build process, it also tries to push it automatically to docker.io/vnmd/relion_4.0.1 , which I don't have permission to. To test it properly, I need to make sure the GUI launches technically, so it would probably be better if this step wasn't automatic.
I was wondering, is there a way to call other containers when building a container? Relion uses ctffind, motioncor2 and topaz, but they are all third party software that can be use independently. Is the best way to build all of them within the relion container, or can I create containers for them, and use them in the relion container? Topaz would likely cause issues with relion in the same environment, since it needs it's own conda environment.
Dear @vennand,
Great to hear you are making progress :)
the toolVersion should be replaced in the README once it's building in our repository. You can just send a pull request. This will then build it and provide you with a command for testing the container and you can test if the GUI works. We are working on an interactive graphical build system that will allow to do all of this nicely, but right now, that's the best we can do.
Yes, you can call other containers from within containers! For this to work you need to install singularity and lmod in the container and the you can use the module system to call any other container. Here is an example where we do this: https://github.com/NeuroDesk/neurocontainers/blob/master/recipes/code/build.sh
from this example you need:
--install lmod \
--env GOPATH='$HOME'/go \
--env PATH='$PATH':/usr/local/go/bin:'$PATH':${GOPATH}/bin \
--run="wget https://dl.google.com/go/go$GO_VERSION.$OS-$ARCH.tar.gz \
&& tar -C /usr/local -xzvf go$GO_VERSION.$OS-$ARCH.tar.gz \
&& rm go$GO_VERSION.$OS-$ARCH.tar.gz \
&& mkdir -p $GOPATH/src/github.com/sylabs \
&& cd $GOPATH/src/github.com/sylabs \
&& wget https://github.com/sylabs/singularity/releases/download/v${SINGULARITY_VERSION}/singularity-ce-${SINGULARITY_VERSION}.tar.gz \
&& tar -xzvf singularity-ce-${SINGULARITY_VERSION}.tar.gz \
&& cd singularity-ce-${SINGULARITY_VERSION} \
&& ./mconfig --without-suid --prefix=/usr/local/singularity \
&& make -C builddir \
&& make -C builddir install \
&& cd .. \
&& rm -rf singularity-ce-${SINGULARITY_VERSION} \
&& rm -rf /usr/local/go $GOPATH \
&& ln -s /usr/local/singularity/bin/singularity /bin/" \
--copy module.sh /usr/share/ \
the content of module.sh is:
trap "" 1 2 3
case "$0" in
-bash|bash|*/bash) . /usr/share/lmod/6.6/init/bash ;;
-ksh|ksh|*/ksh) . /usr/share/lmod/6.6/init/ksh ;;
-zsh|zsh|*/zsh) . /usr/share/lmod/6.6/init/zsh ;;
-sh|sh|*/sh) . /usr/share/lmod/6.6/init/sh ;;
*) . /usr/share/lmod/6.6/init/sh ;; # default for scripts
esac
trap - 1 2 3
You might have to adjust the lmod version depending on your base operating system container version
I would go down the route of multiple containers if: 1) a software has conflicting dependencies 2) a software is useful for multiple other containers
Keen to see this working :) Thank you Steffen
Hi @stebo85,
I'm not sure I understand how to use this. So let's say I want to create a container for Topaz, so I can use it in relion. In the Topaz container, I need to install lmod and singularity? Can I use the module.sh file as is?
After that, how to I call Topaz in the relion container? For relion, I would need to set an environment variable that points to the Topaz executable. Would this work then?
--env RELION_TOPAZ_EXECUTABLE=/usr/local/topaz/latest/topaz
Thanks, André
Dear @vennand
you don't need to install singularity and lmod in the Topaz container - only in the relion container (so the container that calls other containers). You should be able to use the module.sh file as is, but check that the lmod version fits with your base image version - otherwise adjust the version number.
the variable for RELION_TOPAZ_EXECUTABLE is a bit tricky and you need to try what works there, a few options: 1) the easiest would be if you can run module load topaz/version and then topaz will just be on the path - maybe you can leave the variable empty and it first tries to find topaz on the path? 2) if 1) doesn't work then you could create a wrapper script that does this and store it for example under /usr/local/topaz/latest/topaz this wrapper script then contains something like:
module load Topaz/x.x.x
topaz "$@"
Then you can set the variable to this and it might work: --env RELION_TOPAZ_EXECUTABLE=/usr/local/topaz/latest/topaz
Let me know how you go with this :)
Thank you Steffen
Hi @stebo85,
Thanks for the instructions! I think it's going well so far.
Quick questions, would you recommend to install other software under /usr/local/
or /opt/
?
I'll create a separate container for Topaz, but it seems like it's not necessary for motioncor2 and ctffind. So at the moment, I'm installing under /usr/local/ctffind/4.1.14 and /usr/local/motioncor2/1.6.4
Thanks, André
It doesn’t matter where you install other software in the container :) I prefer /opt - but as long as the binaries are on the path variable it will work
Hi @stebo85,
I've made a pull requests for relion and topaz. I'm assuming the next step is to test it with a GUI, on a machine with a GPU ideally?
Also, side note, is it possible to create containers for software with licenses? Could be a dongle server or network license?
Thanks, André
ok, great I merge that.
Yes, once the container is built you get a command for testing the container.
Dongles are difficult but network licenses would work
Topaz worked and you can test: https://github.com/NeuroDesk/neurocontainers/issues/613
Relion failed with:
[ 9/20] RUN git clone 'https://github.com/3dem/relion.git' --branch=4.0.1 && cd relion && mkdir build && cd build && cmake -DCUDA_ARCH= -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-11.8 -DCMAKE_INSTALL_PREFIX=/opt/relion-4.0.1/ -DFORCE_OWN_FLTK=ON .. && make && make install:
Does the machine have a GPU driver installed? It need nvidia-smi, otherwise the environment variable COMPUTE_CAPABILITY
will come back empty.
Should I try to add some conditions, so that if COMPUTE_CAPABILITY
is empty, it'll compile for CPU only (without -DCUDA_ARCH=${COMPUTE_CAPABILITY}
)? Though a relion without GPU won't be very appealing to users...
Dear @vennand - no, the build server doesn't have a GPU - do you really need a GPU to build it? Can't you just set this variable explicitly?
I need to specify the architecture of the GPU the container will run on, so I can't set it explicitly. For example, I'm testing it on a Telsa P40, which has a compute capability of 6.1, but new GPUs like an RTX 4090 has 8.5. It affects which binaries the software uses to compile. It'll throw errors when we try to use it otherwise.
Anyway, I'll try to find a work-around with if/else statements inline.
It would need 3 specific containers for each. You could encode this through the toolVersion. Can you build a container with compute capability 6.1, 8.5 and CPU only?
I added this instead, seems to work with my testing. Could that cause issues in the future with how NeuroDesk is coded?
&& if [[ -z '${COMPUTE_CAPABILITY}' ]] \ ; then cmake -DCMAKE_INSTALL_PREFIX=/opt/${toolName}-${toolVersion}/ -DFORCE_OWN_FLTK=ON .. # RELION: If there is no NVIDIA driver installed, compile without GPU \ ; else cmake -DCUDA_ARCH=${COMPUTE_CAPABILITY} -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-11.8 -DCMAKE_INSTALL_PREFIX=/opt/${toolName}-${toolVersion}/ -DFORCE_OWN_FLTK=ON .. ; fi # RELION: Otherwise, compile with GPU architecture and CUDA version \
It tests if COMPUTE_CAPABILITY
is of length 0. If it is, then it compiles for CPU only, otherwise it uses the correct GPU architecture
That works, but will only build the CPU version. Why not building GPU versions as well?
I think I don't understand how containers work then. I assumed they would be built on each machine, to install them. Is it that once they are built by the test server, they are used as is? I'd have to test on a machine without GPU, but I suspect the compile process of relion checks for a NVIDIA driver, and may compile for CPU only even if I specify an architecture.
I can make a container for each compute capability, but that would make 16 containers (from 3.5 to 9.0), plus CPU only
Dear @vennand,
Containers are built once and then used by the target system as is. They are not rebuilt on the target system. So, correct: You would need to build 16 different containers if you want to support every target architecture in your case. To have a more realistic option, why not building a container with CPU, a container with the latest compute capability and a container with an older compute capability - then see what deployment systems you have in practice and how these containers run?
Thank you Steffen
Do you have an example of containers using different toolVersions? Do I have to create new recipe directories?
We don't have a tool yet that needs that. I think it would be easiest for now to push different versions of the build.sh file to the repository and see if this actually works. Then we can see how to streamline this.