DeiC-HPC / cotainr

cotainr - a user space Apptainer/Singularity container builder.
European Union Public License 1.2
17 stars 3 forks source link

Question: Cotainr and LUMI container wrapper #37

Closed ThomasA closed 1 year ago

ThomasA commented 1 year ago

How do LUMI container wrapper and Cotainr relate, or compare to each other?

Chroxvi commented 1 year ago

This is a very good question as, at first sight, they appear very similar, i.e. they both offer an easy way create a conda/pip environment in a Singularity/Apptainer container for use on LUMI. However, the way they achieve this is quite different.

Cotainr is basically a fairly thin wrapper around singularity build. When invoked with the --conda-env argument, it creates a Singularity/Apptainer container from the base image you provide, installs miniconda in the container, creates your specified conda environment and sets a few environment variables that activates your conda environment when you run singularity exec, singularity run, etc. The result is a .sif file like any other Singularity/Apptainer container built using singularity build. You run your container just like any other container using singularity/apptainer exec/run/shell..., with all the usual pros/cons of running containers. The main reason to use cotainr is have an easy way to build such containers in user space, i.e. without root or fakeroot which is not available on LUMI, but is required for building a container from a singularity definition file using singularity build. Currently, the main use case for cotainr is conda/pip environments, but we expect to support other use cases later on, e.g. R environments. We consider the current cotainr code as production ready.

As for the LUMI container wrapper (full disclosure: I am not a developer of LUMI container wrapper, so this may be a slightly opinionated description of it), assuming you already have some installed software on LUMI, it works by creating a squashfs image that contains your installed software. It then creates a set of wrapper scripts for all the binaries in your installed software that, instead of calling your installed binary, it runs a singularity container (an OpenSUSE leap image on LUMI), mounts the squashfs image and calls the binary in the squashfs instead. That way, you as a user, don't even notice that you are actually running a container, you just call you binary as usual. In addition to wrapping an existing installation, the container wrapper also provides some frontends for directly wrapping pip/conda environments defined in conda_env.yml or requirements.txt files. Depending on how much of the LUMI software stack you add into the squashfs image, you might have to "re-wrap" you installation again when the LUMI software stack is updated. You also have to take note of some of the limitations of this approach as described here: https://github.com/CSCfi/hpc-container-wrapper#limitations. From a performance point of view, you have to realize that every time you call a binary in your wrapped installation, you actually start a new container instance. Thus if you run the same binary in massively parallel or do some sort of shell piping between commands, things may get a bit unstable. The current version of the container wrapper is a set of bash scripts with some Python scripts mixed in. It is considered experimental but I believe that a production version, rewritten in Go, is in the baking.

So when to use cotainr and when to use the LUMI container wrapper? If want to use containers and want an easy way to build such containers on LUMI for our supported use cases, i.e. currently conda/pip environments, you should use cotainr. If you don't want to learn how to use containers and is just looking for a way to avoid putting too much stress on the Lustre file system, the LUMI container wrapper may be an option for you. (I believe the main reason that the LUMI container wrapper was created in the first place was to wrap the large amount of small files in a typical conda/pip installation into a single image file which plays much more nicely with a Lustre filesystem.)

@Nortamo, as the main developer of the LUMI container wrapper, do you have anything you would like to add to this discussion? @rloewe, did I miss any important points in this discussion?

Nortamo commented 1 year ago

You covered it pretty well. LUMI container wrapper only exist to reduce the number of files / load on Lustre / startup times and then it just tries to negate some of the downsides of the container. So if filesystem is not an issue or you don't need to interact with the host system, there is really no point in using it. If you do need to interact with the host applications and programs (building pip packages against the cray module stack, workflow managers etc.) the the container wrapper can be useful.

ThomasA commented 1 year ago

Thanks for the clarification!