Closed wkharold closed 3 months ago
Cool, thanks!
Since we have a Dockerfile
already, could the Apptainer file be the same as that one, but in Apptainer syntax? I would prefer to avoid having two different images. See https://github.com/MilesCranmer/PySR/blob/master/Dockerfile
Also, would be nice to have some continuous integration tests like we have for the docker file.
Since we have a Dockerfile already, could the Apptainer file be the same as that one, but in Apptainer syntax?
I had some trouble with that approach initially so I went with conda
. Happily, what I learned doing the conda
version allowed me to switch back to pip3
and make it more like the Dockerfile. I'll work on adding an integration test that mimics the Docker one.
Thanks, could it also pull from the python-bullseye image rather than rockylinux image? Since the base image is hosted by docker I don’t see a compelling reason for the base images to be different
Switched to bullseye for compatibility with the Dockerfile. I was using Rocky because that's what SchedMD uses as the base for the Slurm images built for Google's Cluster Toolkit (formerly HPC Toolkit). One my goals is to be able to bind mount the necessary Slurm commands and libs into a container so that you can use the Julia ClusterManagers interface to do distributed SR from within the container. Maybe your already know how to do that with Docker?
Totals | |
---|---|
Change from base Build 10344709906: | 0.0% |
Covered Lines: | 1135 |
Relevant Lines: | 1211 |
Switched to bullseye for compatibility with the Dockerfile. I was using Rocky because that's what SchedMD uses as the base for the Slurm images built for Google's Cluster Toolkit (formerly HPC Toolkit). One my goals is to be able to bind mount the necessary Slurm commands and libs into a container so that you can use the Julia ClusterManagers interface to do distributed SR from within the container. Maybe your already know how to do that with Docker?
I see, thanks. This is a good use-case. Since I have no particular need to use bullseye, so maybe we could switch the defaults of both the Dockerfile and the Apptainer to use Rocky? You can change the default here: https://github.com/MilesCranmer/PySR/blob/3aee19e38ceb3e0e1617d357a831400e01204658/Dockerfile#L6
One my goals is to be able to bind mount the necessary Slurm commands and libs into a container so that you can use the Julia ClusterManagers interface to do distributed SR from within the container.
I do Slurm stuff with PySR, but haven't done it from a container before. In principle it should be fine, I just haven't tried yet.
One other thing that is needed is a function to install all packages, even the Julia extensions like for ClusterManagers.jl. I just made a PR https://github.com/MilesCranmer/PySR/pull/688 to add this function. So if your apptainer file calls pysr.load_all_packages()
, all the Julia packages ever needed by PySR (including ClusterManagers.jl) will be installed into the read-only environment.
Ok I merged https://github.com/MilesCranmer/PySR/pull/688 which will help for your stuff, do you want to merge those changes into your branch?
I see, thanks. This is a good use-case. Since I have no particular need to use bullseye, so maybe we could switch the defaults of both the Dockerfile and the Apptainer to use Rocky? You can change the default here:
It doesn't look like there are Rocky based containers for Julia on Docker Hub. I think bullseye is OK for multicore work. Transitioning to Rocky would require significant changes and testing on Slurm to ensure that it worked properly.
Oh sorry I thought you meant you wanted to switch to Rocky. I'm happy with Bullseye too though. As long as both of the container types can have overall similar images.
This is my first foray into Github CI testing. Would you prefer a modified CI_docker.yml file or a new CI_apptainer.yml file. My naive though is that the latter would be simpler but I could easily be missing something.
Scanning the test failures it looks like most/all of them are the result of the tests being run in a read-only directory. To avoid a version module not found
error I cd into /pysr (which is in the container's read-only file system) before doing anything involving pysr
. Is there a python environment variable I could set to avoid having to do that?
Is there a way you can mount the directory it uses for testing to a writeable directory?
I tried this on my institutional cluster to see if I could use it locally, but it failed with:
> apptainer build pysr.sif Apptainer.def
INFO: User not listed in /etc/subuid, trying root-mapped namespace
INFO: fakeroot command not found
INFO: Installing some packages may fail
FATAL: Unable to build from Apptainer.def: while parsing definition: Apptainer.def: invalid section(s) specified: arguments
Is the arguments
section new on 1.3.0` maybe? My institute looks to have 1.1.9. Maybe we could have compatibility for 1.1.0 (or even 1.0.0) to have broader support?
Is the arguments section new on 1.3.0` maybe? My institute looks to have 1.1.9. Maybe we could have compatibility for 1.1.0 (or even 1.0.0) to have broader support?
It got introduced in 1.2 as part of the --build-arg / --build-arg-file
feature. I don't have a problem with hardcoding the Julia/Python/distro bits to make it work with 1.x.
Lots of improvements have gone in since 1.1. You should encourage your institute sys admins to upgrade.
I thought adding PYTHONPATH=/psyr had fixed the
ModuleNotFoundError: No module named 'pysr.version'
error. I no longer see it in my local builds or tests.
Lots of improvements have gone in since 1.1. You should encourage your institute sys admins to upgrade.
Good point. I asked them and they are planning on it in the few month timescale. I guess there will always be some clusters using older versions, so we might as well make it compatible (within reason). Maybe 1.1 is a good lower limit for now (until someone makes a feature request for even earlier)
The error about version.py
missing indicates that it might not be getting installed to the directory we expect. Or maybe when we import pysr, it is importing from the local directory, rather than site-packages
. Basically, version.py
only gets created when one runs python setup.py install
. (And it only gets put into the site-packages
folder of PySR, rather than into the local directory)
I looked in the container and version.py
ends up in /pysr/pysr
. I added that to PYTHONPATH
but the test still fails. I've copied the command from the log and pasted it into my shell and it works. There must be something different about the github environment.
The GitHub actions logs show:
Installing collected packages: fastjsonschema, tornado, rpds-py, pyzmq, psutil, pluggy, platformdirs, nest-asyncio, iniconfig, debugpy, coverage, comm, attrs, referencing, pytest, jupyter-core, jupyter-client, jsonschema-specifications, jsonschema, ipykernel, nbformat, nbval
WARNING: The script debugpy is installed in '/home/runner/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
WARNING: The scripts coverage, coverage-3.12 and coverage3 are installed in '/home/runner/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
WARNING: The scripts py.test and pytest are installed in '/home/runner/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
WARNING: The scripts jupyter, jupyter-migrate and jupyter-troubleshoot are installed in '/home/runner/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
WARNING: The scripts jupyter-kernel, jupyter-kernelspec and jupyter-run are installed in '/home/runner/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
WARNING: The script jsonschema is installed in '/home/runner/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
WARNING: The script jupyter-trust is installed in '/home/runner/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
which is odd, I thought it would install to paths within the image? Or are there other paths that need to be set before the pip installs?
e.g., maybe it just copies the definition of HOME
from the host environment?
Apptainer mounts the home directory of its initiator by default so those messages make sense but I didn't think about adding /home/runner/.local/bin
to PATH. I'll add that and see if it changes things.
Oh I see. But shouldn’t it be installed inside the container rather than the user’s home directory (which thus wouldn’t be saved in the container)?
Maybe the Python pip install
is trying to install to the home dir rather than an env?
Looks like we can find this with python -m site
: https://stackoverflow.com/a/46071447
I looked a little closer at my local environment and realized that I don't run the tests in the PySR directory. When I did that I reproduced the failure we're seeing, ugh. So I added steps to create a new directory specifically for testing in the CI action. I also experimented with the Apptainer %test
section of the definition file and it seemed to simplify things so added that and changed CI_apptainer.yml
to use it.
fix: 686
This pull request adds an Apptainer definition file to PySR. Researchers can use it to build an Apptainer container that packages PySR and its dependencies (via a conda environment).