IAMconsortium / pyam

Analysis & visualization of energy & climate scenarios
https://pyam-iamc.readthedocs.io/
Apache License 2.0
220 stars 115 forks source link

Migrate to poetry for inspection #827

Closed glatterf42 closed 3 months ago

glatterf42 commented 4 months ago

Closes #823 and #831.

Please confirm that this PR has done the following:

Description of PR

This PR migrates the contents of ruff.toml, setup.cfg and setup.py to pyproject.toml to use poetry for package management. Please check that no information was lost and everything still works as usual.

glatterf42 commented 4 months ago

Some notes about pyproject.toml:

glatterf42 commented 4 months ago

At the moment, I did not migrate the legacy-pytest workflow to poetry since I'm not sure what the best way is to install these specific versions. We could define them for the oldest supported python version and then run the legacy-pytest workflow on precisely that python version as suggested here.

danielhuppmann commented 4 months ago

When installing ixmp4 into my pyam-dev-environmemt, I was able to simply do pip install -e . in the ixmp4 directory - seems that pip can handle the pyproject toml directly (without needing to install poetry).

danielhuppmann commented 4 months ago

Also, the nightly tests should not use the poetry-lock packages but should install the latest available versions of all packages. This test is helpful as a warning when e.g pandas releases a new version that is incompatible with the existing codebase. So this tests replicates the behavior when a user installs pyam from pypi with the latest available dependencies.

glatterf42 commented 4 months ago

The nightly tests now update the dependencies before installing them, this should be the desired behaviour, if I understand correctly.

Also, I similarly think you can run pip install -e here, but you don't have to. Using poetry instead takes care of venvs and compatible dependency versions, etc.

glatterf42 commented 4 months ago

I'm wondering why the docs-workflow seems to require sphinx-build in the CI, when it doesn't error out because of that locally. There's still an error locally, but only after some commands have been processed, whereas it's immediately failing here.

glatterf42 commented 4 months ago

Notes for tomorrow: pytest-legacy errors out because it's running pip install -e. [...], which would require the various groups to be defined as extras, but since we have them as optional groups at the moment, they are not found and thus things like pytest are not installed.

The other tests presumably fail due to a similar error: we install the optional groups in poetry's venv, but then try to run pytest and sphinx outside of it. We probably need to activate the venv's first like we do for ixmp4 or run something like poetry pytest ....

      - name: Run tests
        run: |
          source .venv/bin/activate
          pytest --cov-report xml:.coverage.xml --cov-report term --cov=ixmp4 -rsx --benchmark-skip
codecov[bot] commented 4 months ago

Codecov Report

Attention: Patch coverage is 85.71429% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 95.1%. Comparing base (ddbb88e) to head (c7821de).

Files Patch % Lines
pyam/__init__.py 33.3% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #827 +/- ## ======================================= + Coverage 95.0% 95.1% +0.1% ======================================= Files 64 63 -1 Lines 6134 6130 -4 ======================================= + Hits 5828 5831 +3 + Misses 306 299 -7 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

glatterf42 commented 3 months ago

@danielhuppmann @phackstock All tests are passing now, so please review if you like this setup. Please also consider the following notes:

Caching on Windows runners

For the pytest workflow, we might want to cache poetry's venv on all runners, including the Windows runners. If that doesn't work as expected, this is a known issue and a solution has been suggested here: https://github.com/marketplace/actions/install-poetry-action#caching-on-windows-runners

Activating poetry's venv in a CI workflow

To avoid having to specify different paths to activation scripts on different runners in the CI, the install poetry action we use offers an environment variable that we could use to activate the venv in an OS-agnostic way (see the second point here). However, for all current workflows, this is never necessary since we can also use poetry run to run command line tools inside the venv without activating it explicitly first, so e.g. poetry run pytest --mpl works perfectly fine.

Default version constraints for poetry

One of poetry's main downsides is that it doesn't support PEP 621 yet, which (among other things) appears in the default form that dependency requirements take:

(pyam-iamc-py3.12) fridolin@fridolin-Latitude-5520 ~/pyam (migrate/to-poetry)> (.venv) poetry add black --group dev
Using version ^24.2.0 for black

By default, poetry will try to use the latest version and add a caret specification to it, which allows limited updates. In particular, caret requirements will prevent poetry from picking up new major version releases. Since you want the nightly workflow to test latest versions, I'm calling poetry update before poetry install and I specified all requirements as >=, but be careful when adding new requirements to explicitly use the >= if you don't want the default ^. Please also note that the PR checks don't include the nightly workflow (which might be worth adapting just for one run once everything else is fine).

If you decide to merge this PR, we should of course still adapt the install instruction docs as well as the release notes.

glatterf42 commented 3 months ago

Okay, let me write up what I've learned about caching for GitHub Actions now :) There are several ways to do it. The setup-python action already has caching ability implemented and also showcases some examples. This works for various programming languages and for various package managers for python, one of them explicitly being poetry. However, this action says it uses toolkit/cache under the hood, which seems to be the same as the official cache action, so I figure we can use the original (as is also showcased by the install-poetry action). When called, the complete cache action tries to restore a cache from the key provided to the path provided. It then provides an output option that we can use to decide if further steps should be run. After the final step of the workflow is run, the cache action tries to save a cache to the key provided. By now, the action also offers both parts of this functionality as separate actions, so we can fine-tune when cache should be restored and saved. They even provide a whole document on caching strategies on how to utilize these options best.

This got me thinking: before 65a82af, we restored the cache in the pytest-legacy workflow from the poetry.lock file on main and then downgraded and installed some dependency versions. This had the possibility of still using some of the cached packages, but we could make better use of the cache. So I switched the order of adding packages to the lock file and restoring a cache. The idea is now this: after installing poetry, we use poetry add --lock ... to add the pinned dependency versions to poetry.lock. This file will therefore change during the CI run, but this change will not be saved or committed to git or some such! We then call the cache action with a key that references the hash of the lock file. This ensures the cache will be updated if and only if our pytest-legacy-poetry.lock changes. After running the tests as usual, the cache action will finally save a cache based on our pytest-legacy-poetry.lock, which should be exactly what we want :)

glatterf42 commented 3 months ago

If the above solution convinces you for pytest-legacy, we could do the same for the nightly workflow since poetry update also accepts --lock :)

glatterf42 commented 3 months ago

I installed this (using pip install -e .) in a clean conda environment and it worked as expected (including dynamically changing the version when checking out a different commit).

Were you able to install the optional groups as well? Because to clarify, I'm not sure you currently can install them via pip. We currently define the optional dependencies pyam requires as optional dependency groups, but in theory, we could also list them as extras. I think the current way is clearer, though.

danielhuppmann commented 3 months ago

Were you able to install the optional groups as well?

You are right, this is ignored. Given that we only have a handful of packages per group, I think the extras-approach would still be clear enough, right?

glatterf42 commented 3 months ago

I'll convert the optional groups to extras, then.

danielhuppmann commented 3 months ago

Thank you!

glatterf42 commented 3 months ago

No problem! Some notes for handling extras:

glatterf42 commented 3 months ago

Just noticing myself: poetry install (without -E) after poetry install -E will remove packages installed via -E.

glatterf42 commented 3 months ago

The tests for python 3.12 will fail again for this reason: pandas-datareader is still using distutils, which was deprecated and removed with python 3.12. Setuptools still contains it, that's why it was found earlier when we still included setuptools as a dependency, but I'm not sure we should do that. We could instead install it from somewhere else, e.g. here, but I'm also not sure this is the best course of action. Looking at pandas-datareader, the last version was published in 2021, even though their main branch seems somewhat active with a commit five months ago. Is this package integral to pyam's functionality or could we replace it with something more modern?

danielhuppmann commented 3 months ago

Looks good, pip install -e.[...] now install the extras as expected. Note that -e for pip means "editable", meaning that source code will not be move to site-packages. Makes it easier to develop code because you don't have to run pip install every time...

Th pandas-datareader is used to access the World Bank data sources, but it could probably be replaced by https://pypi.org/project/wbdata/. Simply remove the test and the installation in the GitHub Actions and start an issue so that I don't forget to migrate to the new package (or a good alternative).

glatterf42 commented 3 months ago

A small correction first: poetry add wbdata --extras optional_io_formats --optional will add wbdata = {version = "^1.0.0", optional = true, extras = ["optional_io_formats"]} to pyproject.toml, which is not quite what we want. It looks like there's currently no CLI option to immediately add a dependency to one of pyam's extras, so this needs to be done manually.

glatterf42 commented 3 months ago

Please note: my mypy is not happy with the recent changes to https://github.com/IAMconsortium/pyam/blob/ddbb88e1838dc09885c5cb393b59c8968aca674d/pyam/__init__.py#L38 It claims that logging has no attribute configure_logging.

danielhuppmann commented 3 months ago

Just remove the worldbank-feature entirely from the tests for now, and start an issue as a reminder to re-insert later.

glatterf42 commented 3 months ago

Just remove the worldbank-feature entirely from the tests for now, and start an issue as a reminder to re-insert later.

But ~I'm pretty sure~ it's fixed here.