IAMconsortium / pyam

Analysis & visualization of energy & climate scenarios
https://pyam-iamc.readthedocs.io/
Apache License 2.0
221 stars 115 forks source link

Regression in require behaviour #793

Open znicholls opened 8 months ago

znicholls commented 8 months ago

require_data is not a drop in replacement for require_variable. This leads to a regression in behaviour with no easy fix for users.

See script below for demonstration.

Script ```python import numpy as np import pandas as pd import pyam test = pd.DataFrame( np.ones((8, 3)), columns=[2010, 2015, 2020], index=pd.MultiIndex.from_tuples( [ ( "scenario_a", "model_a", "Emissions|CO2|Waste", "World", "GtC / yr", ), ( "scenario_a", "model_a", "Emissions|CO2|Other", "World", "GtC / yr", ), ( "scenario_b", "model_a", "Emissions|CO2|Waste", "World", "GtC / yr", ), ( "scenario_b", "model_a", "Emissions|CO2|Industrial", "World", "GtC / yr", ), ( "scenario_a", "model_b", "Emissions|CO2|Other", "World", "GtC / yr", ), ( "scenario_a", "model_b", "Emissions|CO2|Industrial", "World", "GtC / yr", ), ( "scenario_b", "model_b", "Emissions|CO2|AFOLU", "World", "GtC / yr", ), ( "scenario_b", "model_b", "Emissions|CO2|Industrial", "World", "GtC / yr", ), ], names=[ "scenario", "model", "variable", "region", "unit", ], ), ) test = pyam.IamDataFrame(test) if pyam.__version__.startswith("2"): matches_requirements = test.require_data( variable=["Emissions|CO2|Other", "Emissions|CO2|Waste"], exclude_on_fail=True ) print("3 scenarios fail (the ones that don't have BOTH requirement)") print(test.exclude) assert test.exclude.sum() == 1 else: matches_requirements = test.require_variable( variable=["Emissions|CO2|Other", "Emissions|CO2|Waste"], exclude_on_fail=True ) print("Only 1 scenario fails (the one that doesn't have EITHER requirement)") print(test.meta) assert test.meta["exclude"].sum() == 1 ```
Behaviour with pyam-iamc 2.0 and require_data ```bash % pip list | grep pyam-iamc && python scratch.py pyam-iamc 2.0.0 3 scenarios fail (the ones that don't have BOTH requirement) model scenario model_a scenario_a False scenario_b True model_b scenario_a True scenario_b True dtype: bool Traceback (most recent call last): File ".../scratch.py", line 85, in assert test.exclude.sum() == 1 ^^^^^^^^^^^^^^^^^^^^^^^ AssertionError ```
Behaviour with pyam-iamc 1.9 and require_variable ```bash % pip list | grep pyam-iamc && python scratch.py pyam-iamc 1.9.0 Only 1 scenario fails (the one that doesn't have EITHER requirement) exclude model scenario model_a scenario_a False scenario_b False model_b scenario_a False scenario_b True ```

I think the basic difference is that require_variable did an OR requirement (any match was marked as a match). require_data is an AND requirement (all requirements had to match in order to be marked as match).

danielhuppmann commented 8 months ago

I assume that I did think through these changes a while back, but can't recollect my thoughts right now...

But from a first-principles point of view, I do think that checking all items in a list is more intuitive than any for a "requirement".

Question to me is what your use case is? Are you trying to ensure that at least one of "Waste" or "Other" is present?

znicholls commented 8 months ago

But from a first-principles point of view, I do think that checking all items in a list is more intuitive than any for a "requirement".

Me too

Question to me is what your use case is? Are you trying to ensure that at least one of "Waste" or "Other" is present?

Trying to get this to behave https://github.com/iiasa/climate-assessment/pull/47 @jkikstra wrote the code that uses this and I assume was trying to do a check for one or both of them being there, but I don't actually know (see this comment for the function which calls it https://github.com/iiasa/climate-assessment/pull/47#issuecomment-1777718277)

znicholls commented 8 months ago

cc @phackstock

danielhuppmann commented 8 months ago

Still not sure what the actual use case is from that comment, but I guess we could add a kwarg how={"all", "any"}, default 'all', inspired by pandas.dropna().

As for implementation, I guess if any data is present after applying the filters (=kwargs of require_data()) , then the "any"-requirement is satisfied for the filters.

znicholls commented 8 months ago

Here's the line where it's used: https://github.com/iiasa/climate-assessment/blob/485f3d24fc646ad8d77c65ac5e787a27dc79db04/src/climate_assessment/checks.py#L788

Up to @jkikstra and @phackstock whether it's easier to add the feature back into pyam or just hack a workaround into climate-assessment

phackstock commented 8 months ago

No strong feelings from my side either way. I would say in the interest of time it's better to build a workaround in climate-assessment.