IAMconsortium / pyam

Analysis & visualization of energy & climate scenarios
https://pyam-iamc.readthedocs.io/
Apache License 2.0
236 stars 120 forks source link

implementation for fast categorize #819

Closed gidden closed 8 months ago

gidden commented 9 months ago

This is only a prototype for now, but I hope shows how we could implement a fast categorization in line with previous work on df.validate()

Please confirm that this PR has done the following:

Description of PR

This PR uses the machinery from validate() now in categorize() resulting in huge speedups

codecov[bot] commented 8 months ago

Codecov Report

Attention: Patch coverage is 85.71429% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 95.0%. Comparing base (ddbb88e) to head (fed374a). Report is 3 commits behind head on main.

Files Patch % Lines
pyam/core.py 85.7% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #819 +/- ## ===================================== Coverage 95.0% 95.0% ===================================== Files 64 63 -1 Lines 6134 6144 +10 ===================================== + Hits 5828 5841 +13 + Misses 306 303 -3 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

gidden commented 8 months ago

Not clear to my why ruff CI is failing

glatterf42 commented 8 months ago

The error message seems instructive to me:

pyam/core.py:1:1: I001 [*] Import block is un-sorted or un-formatted
pyam/core.py:69:29: F401 [*] `pyam.validation._apply_criteria` imported but unused
Found 2 errors.
[*] 2 fixable with the `--fix` option.

We have recently introduced ruff in this repo, which can handle several nice things at once. For example, it sorts your import blocks and cleans up unused imports. All these fixes have been applied to the main branch, so if you rebase this PR on top of that, you'll automatically get them. If you don't want to do that for whatever reason, run ruff check . --fix in your local environment (or use --diff instead of --fix first to see what will happen) and commit the changes again :)

glatterf42 commented 8 months ago

If you've done that and the action is still failing, please compare your local version with the one used for the check. Per default, GHA will use the latest pypi release, so 0.3.2.

gidden commented 8 months ago

Correct, all we do here is update the internals of categorize() to utilize another top-level API (validate()). I'd argue this is even more robust than depending on other internal-only functions. Maybe @danielhuppmann can quickly give a thumbs up?

danielhuppmann commented 8 months ago

Will review tonight

danielhuppmann commented 8 months ago

This PR does not actually implement the new signature of validate() using direct filters and upper/lower bound arguments, hence the deprecation warning (inherited from validate()) without being able to use the new signature would be probably very confusing to users. So I fully implemented this in a new branch building on the prototype by @gidden..

Closing in favor of #837

gidden commented 8 months ago

As an FYI, I was utilizing this prototype including the filters and upper/lower bound arguments. Code snippets that I was using are here:

in a config.yaml

  categorize:
    name: foo
    label: Medium Risk
    color: xkcd:baby blue
  criteria:
    upper_bound: 1000000
    lower_bound: 800000
    year: 2100
    region: World

functioning code

        for key, data in self.definitions.items():
            if "categorize" not in data:
                continue
            logger.info(f"Preparing categorization for {key}")

            args = data["categorize"]
            self.db.categorize(
                args["name"],
                args["label"],
                variable=data["variable"],
                **data["criteria"],
                **{k: v for k, v in args.items() if k not in ["name", "label"]},
            )
danielhuppmann commented 8 months ago

Ok, right, the method works with the new signature, but it was not described in the docstring and there were no tests added in this PR - this is what I implemented in the new branch and PR.