coecms / cms-conda-singularity

A repository for the squashfs'd hh5 conda installations.
Apache License 2.0
1 stars 3 forks source link

Use containerised environment for python package development #34

Open rbeucher opened 1 year ago

rbeucher commented 1 year ago

Hi @dsroberts,

I was wondering how we could support Python package development in the containerized environment. Doing a pip install -e does seem to install the package in the squash filesystem but I am not sure it will work. Also, my understanding is that anyone can do that and potentially mess up the squashfs... Am I right?

Romain

dsroberts commented 1 year ago

Hi @rbeucher

The squashfs should be mounted read-only, so there should be no chance for anyone (even the owner) to add things to it through normal use. Developing within the squashfs is a bit tricky, you pretty much need to follow the procedure in build.sh to get to the point where the squashfs is 'unsquashed' in a separate directory, the base env has been copied to some other file system and then launched with the singularity shell command from that script. Then when you're done, re-create the squashfs and use it in place of the default squashfs for the corresponding environment. There is still a kink to figure out here, as I think I need to build in an option that overrides the launcher script's use of its default squashfs if it isn't already in the environment. Worst case is building the singularity exec line yourself based on the one in the launcher script.

dsroberts commented 1 year ago

Hi @rbeucher I thought about this some more, and after messing this procedure up a few times myself I decided the creation of a development environment should really be scripted. There is now a dev_prep.sh script in the scripts directory. Launch a PBS job with at least 100GB of jobfs requested and source that script. It'll unpack the squashfs corresponding to the current unstable environment to $PBS_JOBFS and copy across the base conda env. It'll then use singularity to bind mount the base env over the top of /g/data (for development purposes), thus preserving all of the paths, and linking the unpacked squashfs into the envs subdirectory. It defines two functions, launch and finalise. launch gives you an interactive shell within the container, so you can e.g. interactively pip install things or edit files or whatever. finalise repackages the base environment and the squashfs after you've made all of your changes. You'll need to manually move the squashfs from $PBS_JOBFS if you want it to persist beyond the end of the job. Once you've done that, you can use the newly created squashfs in place of the existing one after loading the module, set the environment variables: export CONTAINER_OVERLAY_PATH=/path/to/modified/analysis3-unstable.sqsh and export CONTAINER_OVERLAY_PATH_OVERRIDE=1. Then when you launch any command provided by the unstable conda env, your modified squashfs will be used in place of the one in the global installation directory.

ETA: The last few commits fix some issues around updating the existing scripts during the build process, so you may want to merge as much of that stuff as you can into the MED conda environments.

rbeucher commented 1 year ago

That sounds really good! Thanks @dsroberts. I'm gonna check that