Open znicholls opened 8 months ago
I assume that I did think through these changes a while back, but can't recollect my thoughts right now...
But from a first-principles point of view, I do think that checking all items in a list is more intuitive than any for a "requirement".
Question to me is what your use case is? Are you trying to ensure that at least one of "Waste" or "Other" is present?
But from a first-principles point of view, I do think that checking all items in a list is more intuitive than any for a "requirement".
Me too
Question to me is what your use case is? Are you trying to ensure that at least one of "Waste" or "Other" is present?
Trying to get this to behave https://github.com/iiasa/climate-assessment/pull/47 @jkikstra wrote the code that uses this and I assume was trying to do a check for one or both of them being there, but I don't actually know (see this comment for the function which calls it https://github.com/iiasa/climate-assessment/pull/47#issuecomment-1777718277)
cc @phackstock
Still not sure what the actual use case is from that comment, but I guess we could add a kwarg how={"all", "any"}
, default 'all', inspired by pandas.dropna().
As for implementation, I guess if any data is present after applying the filters (=kwargs of require_data()
) , then the "any"-requirement is satisfied for the filters.
Here's the line where it's used: https://github.com/iiasa/climate-assessment/blob/485f3d24fc646ad8d77c65ac5e787a27dc79db04/src/climate_assessment/checks.py#L788
Up to @jkikstra and @phackstock whether it's easier to add the feature back into pyam or just hack a workaround into climate-assessment
No strong feelings from my side either way. I would say in the interest of time it's better to build a workaround in climate-assessment.
require_data
is not a drop in replacement forrequire_variable
. This leads to a regression in behaviour with no easy fix for users.See script below for demonstration.
Script
```python import numpy as np import pandas as pd import pyam test = pd.DataFrame( np.ones((8, 3)), columns=[2010, 2015, 2020], index=pd.MultiIndex.from_tuples( [ ( "scenario_a", "model_a", "Emissions|CO2|Waste", "World", "GtC / yr", ), ( "scenario_a", "model_a", "Emissions|CO2|Other", "World", "GtC / yr", ), ( "scenario_b", "model_a", "Emissions|CO2|Waste", "World", "GtC / yr", ), ( "scenario_b", "model_a", "Emissions|CO2|Industrial", "World", "GtC / yr", ), ( "scenario_a", "model_b", "Emissions|CO2|Other", "World", "GtC / yr", ), ( "scenario_a", "model_b", "Emissions|CO2|Industrial", "World", "GtC / yr", ), ( "scenario_b", "model_b", "Emissions|CO2|AFOLU", "World", "GtC / yr", ), ( "scenario_b", "model_b", "Emissions|CO2|Industrial", "World", "GtC / yr", ), ], names=[ "scenario", "model", "variable", "region", "unit", ], ), ) test = pyam.IamDataFrame(test) if pyam.__version__.startswith("2"): matches_requirements = test.require_data( variable=["Emissions|CO2|Other", "Emissions|CO2|Waste"], exclude_on_fail=True ) print("3 scenarios fail (the ones that don't have BOTH requirement)") print(test.exclude) assert test.exclude.sum() == 1 else: matches_requirements = test.require_variable( variable=["Emissions|CO2|Other", "Emissions|CO2|Waste"], exclude_on_fail=True ) print("Only 1 scenario fails (the one that doesn't have EITHER requirement)") print(test.meta) assert test.meta["exclude"].sum() == 1 ```Behaviour with pyam-iamc 2.0 and require_data
```bash % pip list | grep pyam-iamc && python scratch.py pyam-iamc 2.0.0 3 scenarios fail (the ones that don't have BOTH requirement) model scenario model_a scenario_a False scenario_b True model_b scenario_a True scenario_b True dtype: bool Traceback (most recent call last): File ".../scratch.py", line 85, inBehaviour with pyam-iamc 1.9 and require_variable
```bash % pip list | grep pyam-iamc && python scratch.py pyam-iamc 1.9.0 Only 1 scenario fails (the one that doesn't have EITHER requirement) exclude model scenario model_a scenario_a False scenario_b False model_b scenario_a False scenario_b True ```I think the basic difference is that
require_variable
did an OR requirement (any match was marked as a match).require_data
is an AND requirement (all requirements had to match in order to be marked as match).