IAMconsortium / pyam

Analysis & visualization of energy & climate scenarios
https://pyam-iamc.readthedocs.io/
Apache License 2.0
234 stars 120 forks source link

Add an option to skip existing intermediate variables when aggregating recursivly #532

Closed pjuergens closed 3 years ago

pjuergens commented 3 years ago

Please confirm that this PR has done the following:

Description of PR

Add an option to skip existing intermediate variables when aggregating recursivly as discussed in #525

codecov[bot] commented 3 years ago

Codecov Report

Merging #532 (25edae0) into main (8d8aa6b) will increase coverage by 0.0%. The diff coverage is 100.0%.

Impacted file tree graph

@@          Coverage Diff          @@
##            main    #532   +/-   ##
=====================================
  Coverage   93.5%   93.5%           
=====================================
  Files         47      48    +1     
  Lines       5228    5257   +29     
=====================================
+ Hits        4891    4920   +29     
  Misses       337     337           
Impacted Files Coverage Δ
pyam/_aggregate.py 99.0% <100.0%> (+<0.1%) :arrow_up:
pyam/_compare.py 100.0% <100.0%> (ø)
pyam/core.py 92.6% <100.0%> (-0.1%) :arrow_down:
tests/conftest.py 100.0% <100.0%> (ø)
tests/test_feature_aggregate.py 98.8% <100.0%> (+<0.1%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 8d8aa6b...25edae0. Read the comment docs.

pjuergens commented 3 years ago

I don't understand why stickler-ci is failing as it doesn't tell me the line where the PR differs from the black code style

danielhuppmann commented 3 years ago

You can directly commit my suggestion to your branch directly, this should appease Stickler. Going forward, I recommend that you install a Black linter utility on your machine and follow this workflow:

After a while, you'll not need to run the linter anymore because you'll be coding black naturally...

pjuergens commented 3 years ago

thanks, that did the job :)

danielhuppmann commented 3 years ago

Looking at the code implementation in a bit more detail, two thoughts:

  1. having thought about this feature again, I think it makes more sense to not add an extra kwarg skip_intermediate but instead do the following:

    • if an intermediate variable exists, check that the values are in line with the aggregate of its components (rather than simply skipping the computation) to ensure internal consistency along the variable hierarchy
    • if it doesn't exist, compute the aggregate of the components and append the data
  2. I'm a bit concerned with the proposed implementation about unexpected behavior if a user has an IamDataFrame with multiple scenarios, where one scenario (scen_a) has some intermediate variables and another (scen_b) has only the lower-level components. With the proposed implementation, scen_b would not be complete.

    The (in my opinion expected) behavior could be tested by adding the following line

    df2.aggregate("Secondary Energy|Electricity|Wind", append=True)

    in the function test_aggregate_recursive() before appending df2 to df.

    This way, the existing test could be modified and you wouldn't need to add an almost-duplicate test.

pjuergens commented 3 years ago

I changed the implementation from comparing variables beforehand to comparing indices after aggregating, which should now be able to deal with different variable sets in different scenarios, regions or even years.

Concerning your first thought: I'd prefer to not check internal consistency while skipping intermediate variables but rather let the user check it afterwards. When explicitly using this feature it should be clear that the intermediate variables are not aggregated by pyam and should be checked. Running check_internal_consistency then gives the user more information about which variables differ and if it's due to rounding errors. In my opinion more useful for potential debugging.

pjuergens commented 3 years ago

It seems like stickler uses slightly different definitions of black than the built-in black-linter in Spyder.

danielhuppmann commented 3 years ago

Thanks for continuing to work on this, sorry for the issues with difference versions of black...

Re your comment:

When explicitly using this feature it should be clear that the intermediate variables are not aggregated by pyam and should be checked.

There is a tradeoff here:

And either way, first running recursive-aggregation and the checking consistency is not an efficient approach, because it requires computing a lot of data twice.

So let me modify my earlier suggestion:

Later extensions could then also do something like recursive={rtol=0.2} where the contents of the dictionary are passed to np.isclose() in the validation to avoid errors if data is almost identical but not quite.

pjuergens commented 3 years ago

I implemented my approach for the aggregation check. I will change the interface with dropping the skip_intermediate keyword after the weekend.