ACCESS-NRI / payu-condaenv

A conda (mamba) python environment for running payu
Apache License 2.0
0 stars 2 forks source link

Using containers for payu conda environment #14

Open jo-basevi opened 9 months ago

jo-basevi commented 9 months ago

Currently, payu is being deployed using conda-pack to compress a micromamba environment, then moving that file down to gadi and unpacking it in the vk83 project. Then, using modulefiles to activate the conda environment as described by @aidanheerdegen here.

The unpacked environment currently is taking up a bit of space, for example for the 1.0.29 tag:

625M    /g/data/vk83/apps/payu/1.0.29 #un-packed enviroment
197M    /scratch/tm70/**/payu/payu-1.0.29.tar.gz #conda-pack enviroment

As the conda-pack creates a standalone environment, it contains a subset of conda so it can re-create the environment when unpacked.

As conda enviroments have a lot of small files which have a performance impact on Gadi's Lustre filesystem, an improvement would be to use containers as discussed in this issue #4. Also, note this potential issue #2 with using containers with payu.

ScottWales commented 9 months ago

If you're using conda pack already you can get it to output in squashfs format, which can easily be added to a base container https://apptainer.org/docs/user/latest/persistent_overlays.html#overlay-embedded-in-sif

jo-basevi commented 9 months ago

Thanks @ScottWales- that is great to know! Using singularity containers is all very new to me

jo-basevi commented 4 months ago

Ok, so I have had some time to learn about using singularity containers. It took a little while to get the squashfs output of conda-pack to be added to the base container as Apptainer didn't like the zstd or xz compression of the conda-pack's squashfs file, so I had to pass in --compress-level 0 to conda-pack. So unfortunately the squashfs files are the same size as the environment.

I was creating the containers using Apptainer (through a lima VM on my mac), and was running into issues with executing files inside the container from squashfs file generated locally. While creating conda-pack squashfs on gadi, adding the persistent overlay to a container and then executing commands inside the container worked fine. I think the errors likely came down to differences in architecture. I'm yet to test whether running conda-pack on github runners and then shipping squashfs that to gadi would work.

After scrolling through the HIVE forum for singularity issues, @ScottWales I saw you mentioned this repo as an example for packaging environments in containers: https://github.com/ScottWales/singularity-conda-template/blob/master/build.sh. That has been really useful, and I've used it as a reference for setting up modulefiles and wrapper scripts for commands that runs the container. It also uses conda-pack with --compress-level 0. I noticed that this has a patch for conda-pack with adding -all-root to the mksquashfs command (https://github.com/ScottWales/singularity-conda-template/blob/master/base/conda-pack-all-root.patch) - so to make files owned by root - was this done to prevent some issues?

I am running into an issue that's probably an issue within payu. In payu run, when it generates a qsub command, it points to the location of python inside the container (sys.executable). So for example inside the container, I had the environment under /payu prefix, so it calls /payu/bin/python - so gets the error ERROR: Could not execv /payu/bin/python! ret=-1 errno=2

$ payu run
Use of these keys is deprecated: collate_walltime, collate_mem.
Instead use collate dictionary and subkey without 'collate_' prefix
Loading input manifest: manifests/input.yaml
Loading restart manifest: manifests/restart.yaml
Loading exe manifest: manifests/exe.yaml
payu: warning: MODULESHOME does not exist; disabling environment modules.
payu: warning: No Environment Modules found; skipping load call.
qsub -q express -P tm70 -l walltime=0:30:00 -l ncpus=4 -l mem=8GB -N double_gyre -l wd -j n -v PAYU_PATH=/payu/bin,MODULESHOME=/opt/Modules/v4.3.0,MODULES_CMD=/opt/Modules/v4.3.0/libexec/modulecmd.tcl,MODULEPATH=/scratch/tm70/jb4202/test-payu-dev/modules:/etc/scl/modulefiles:/opt/Modules/modulefiles:/opt/Modules/v4.3.0/modulefiles:/apps/Modules/modulefiles -l storage=gdata/tm70+scratch/tm70 -- /payu/bin/python /payu/bin/payu-run
114852690.gadi-pbs

Comparing with conda_concept/analysis3-unstable module in /g/data/hh5/public/modules:

$ payu run
Use of these keys is deprecated: collate_walltime, collate_mem.
Instead use collate dictionary and subkey without 'collate_' prefix
Loading input manifest: manifests/input.yaml
Loading restart manifest: manifests/restart.yaml
Loading exe manifest: manifests/exe.yaml
payu: Found modules in /opt/Modules/v4.3.0
qsub -q express -P tm70 -l walltime=0:30:00 -l ncpus=4 -l mem=8GB -N double_gyre -l wd -j n -v PAYU_PATH=/g/data/hh5/public/apps/cms_conda/envs/analysis3-24.01/bin,MODULESHOME=/opt/Modules/v4.3.0,MODULES_CMD=/opt/Modules/v4.3.0/libexec/modulecmd.tcl,MODULEPATH=/g/data/hh5/public/modules:/scratch/tm70/jb4202/test-payu-dev/modules:/etc/scl/modulefiles:/opt/Modules/modulefiles:/opt/Modules/v4.3.0/modulefiles:/apps/Modules/modulefiles -l storage=gdata/hh5+gdata/tm70+scratch/tm70 -- /g/data/hh5/public/./apps/cms_conda/envs/analysis3-24.01/bin/python3.10 /g/data/hh5/public/apps/cms_conda/envs/analysis3-24.01/bin/payu-run

It still has ERROR: Could not execv /g/data/hh5/public/./apps/cms_conda/envs/analysis3-24.01/bin/python3.10! ret=-1 errno=2, but the PAYU_PATH looks more hopeful, and there were no warnings with loading modulefiles.

So maybe if the python path, pointed to the wrapper python script which loads the container and runs payu-run inside the container, it would work. Gadi is currently down to maintenance so I am not able to test if modifying the qsub command would work.

In another un-tested issue, payu-run would run the mpi command, and I am still unsure whether that would even work inside the container as is.

I've also noticed that just running singularity exec/shell on gadi, does take a little while and is variable. One time it didn't run, but then testing the same command an hour or 2 later, it would take a few seconds. This could be due to gadi's variability as well.

ScottWales commented 4 months ago

The --all-root was to fix permission issues when other users try to use the container. I've since moved to doing the squashfs manually outside of conda-pack, see https://git.nci.org.au/bom/ngm/conda-container for the latest version of that repo. Conda pack's automatic path substitution was also incorrectly setting paths in some of the config files we were using so I ended up dropping it.

I do what you're suggesting - we have the python command symlinked to https://git.nci.org.au/bom/ngm/conda-container/-/blob/master/install/imagerun which runs the given command in the container. MPI should work, you should do the same symlinking for the command orted. This is what launches the mpi processes on remote nodes. The command doesn't need to be included in the container, just run in the container environment. See my repo for an example.

jo-basevi commented 4 months ago

Ok, will probably switch to doing the squashfs manually then.. I remember running into some path issues with the current conda-pack'ed environments, so had to run conda-unpack inside the activated environments on gadi to clean up prefixes. Though if its incorrectly setting paths in the first place, that's not great..

Ok thanks, that is good to know about MPI and orted - I'll add a symlink for that command