ACCESS-NRI / reproducibility

Framework and tools for reproducibility testing of models
GNU General Public License v3.0
0 stars 0 forks source link

Isolate test environment on NCI #27

Closed jo-basevi closed 4 months ago

jo-basevi commented 5 months ago

Currently, in the repro tests, the payu environment in vk83 is loaded, and the test requirements of ACCES-NRI/model-config-tests are installed using pip install: https://github.com/ACCESS-NRI/reproducibility/blob/5c3ca6f1fa9c85e5fe6f34e77bcc844a16e13c10/.github/workflows/checks.yml#L77

This is modifying the payu environment. Even if it was installed to a user directory using pip install --user, there's a flag in payu environment that disables looking for packages installed in ~/local- https://github.com/ACCESS-NRI/payu-condaenv/blob/c26c5bb986a60f3feb7f88b7bf6c2d79ccf37b7f/modules/.common#L50-L51 This might not work long term with multiple concurrent repro tests being run from the same user account and model-configs-tests having different versions.

There is a related issue here is for packaging up model-config-tests: https://github.com/ACCESS-NRI/model-config-tests/issues/3

The repro tests could run similarly to the github runners with setup/teardown? (see QA pytests: https://github.com/ACCESS-NRI/access-om2-configs/blob/a57e7fd10c351edef7427ddf42c71ce53c24df27/.github/workflows/pr-1-ci.yml#L114-L143) So if using venv/conda (? If using conda/micromamba, will need to use an install from somewhere)

  1. Create an environment
  2. Run pytests
  3. Teardown environment

I think the eventual model-config-tests package should require a versioned payu, and this will be included in test environment. So this test environment is entirely separate from the payu environment in vk83. I think model-config-tests might require functions/features in a version of payu. For example, adding access-om3 model to model-config-tests uses the payu package: https://github.com/ACCESS-NRI/model-config-tests/pull/19#discussion_r1600820817)

@CodeGat @aidanheerdegen I wrote this quickly so hopefully it makes some sense. Would appreciate any more thoughts and concerns

jo-basevi commented 4 months ago

I just rang a couple tests on gadi using virtual environments and micromamba environments, just to see if there was a decent size comparison between the two. The virtual environments were a third of the size and number of files of the micromamba environment. I ran a small MOM6 experiment using the payu in the virtual environment and it seemed to work fine.

Creating a virtual environment:

$ module load python3/3.11.7
$ python3 -m venv payu_venv
$ source payu_venv/bin/activate
$ pip install payu==1.1.3
$ deactivate

$ du -sh payu_venv
141M    payu_venv

$ find payu_venv | wc -l
4095

Creating a micromamba environment:

$ MAMBA_ROOT_PREFIX=/g/data/tm70/analytics
$ eval "$($MAMBA_ROOT_PREFIX/bin/micromamba shell hook -s posix)"
$ micromamba create --prefix ~/compare_virtual_env/payu_micromamba
$ micromamba activate ~/compare_virtual_env/payu_micromamba
$ micromamba install -c accessnri -c conda-forge -c coecms payu==1.1.3
$ micromamba deactivate

$ du -sh payu_micromamba
391M    payu_micromamba

$ find payu_micromamba | wc -l
12668

Using pip install in virtual environment by default ignores any global or user installed packages. While the micromamba environment, when using pip, looks at user installed packages - so might need to set PYTHONNOUSERSITE or similar.

I also just a tested creating a venv and pip installing a package (in the test, it was jsonschema as I knew payu does not have that as a dependency). I loaded the conda-pack-ed payu environment using modules (module use /g/data/vk83/modules and module load payu/1.1.3). Then I activated the virtual env, and I was able to access payu at /g/data/vk83/apps/payu/1.1.3/bin/payu, and access pip and jsonschema in the virtual environment. So if the decision was to not include payu as a dependency of model-config-tests, it might be possible to run a virtual-env alongside the released payu module.

aidanheerdegen commented 4 months ago

This is really thorough, and interesting, thanks.

The venv is much smaller because it's not got a whole python distribution.

So it sounds feasible to not have payu as a dependency and just pip install the required version, right?

I know this wasn't the purpose of your tests above, but we could have a base conda that we use as a basis for payu virtual environments and then pip install payu into them.

Now I'm wondering if we could run a venv through module load?

Sorry that is off topic, but if I couldn't think of another place to put it.

jo-basevi commented 4 months ago

I was wrong earlier, venv is an isolated environment apart from the python install. So if you were to nest a venv inside of a conda environment, the venv would use the conda install of python, but any packages that are run inside the venv would need be installed. So if model-config-tests used a import payu, then payu would need to be installed inside the venv so I think payu does need to be a dependency of model-config-tests.

I think it would be possible to run venv through module load as it would be similar to the conda modulefiles of inspecting what environment variables have changed. A key part part would be loading a specific python version. I've noticed running pip install payu with python3.12 runs into runtime errors because I think the package was built using python3.11 - it works fine when python3.11 is used.