RolnickLab / ClimateSetExtension

Data Processing Pipelines for ClimateSet - can be use to build your own climate model datasets for machine learning
0 stars 0 forks source link

Create an environment #4

Closed liellnima closed 5 months ago

liellnima commented 5 months ago

Create environment for climateset extension.

Ressources: [!] Ask Seb about his notes to set up the conda environment. (slack + add his comments here in the chat if possible)

Old requirements file: https://github.com/RolnickLab/causalpaca/blob/input4mips_backup_state_julia/requirements_data.txt

Old readme, contains some (unformatted) notes on installation, especially with the xesmf package: https://github.com/RolnickLab/causalpaca/blob/input4mips_backup_state_julia/README.md

The following command line tools must be installed (either as command line tools or in conda/poetry):

Questions: Should we really use the old requirements file? Might be best to just start from scratch?

Hard dependencies:

Requirements:

This entails that the installation works with conda and poetry (cluster).

Final thoughts: handling the hard dependencies is the challenge. Adding other packages later on is usually no problem, they are not as finicky.

f-PLT commented 5 months ago

I think we should start fresh for the requirements and add them as they are required, so we don't add dependencies that might not be required, and/or create version conflicts. Add/manage hard dependencies first and then handle the others as needed.

To this, I'd add some documentation per cluster, like which modules to load and which environment to create/activate.

This issue be dealt with first, as issue #2 and #3 will benefit form it (at least for the hard dependencies).

liellnima commented 5 months ago

Francis, which dependencies are supposed to go into the environment.yml and which ones into the pyproject.toml? I am realizing I don't fully understand the line between those ones yet, other than "core dependencies", and "not-so-core dependencies". I just want to understand that so we can move the dependencies at the right spot if that makes sense?

f-PLT commented 5 months ago

Francis, which dependencies are supposed to go into the environment.yml and which ones into the pyproject.toml? I am realizing I don't fully understand the line between those ones yet, other than "core dependencies", and "not-so-core dependencies". I just want to understand that so we can move the dependencies at the right spot if that makes sense?

Everything should go in pyproject.toml when possible; Conda dependencies should be for when it's the only place and can't be installed by poetry (like CDO), or when it's just a lot simpler (like GDAL).

In this project, if something can't be installed by Poetry (so, is in our environment.yml file), it needs to be available as a module on Compute Canada. What we want to avoid is for the same library to be installed differently between environment.

That's also the strength of Poetry, it will manage dependency conflicts for us in most cases.

f-PLT commented 5 months ago

I'll therefore move xesmf, esfg-pyclient and xarray from environment.yml to pyproject.toml