Closed pjuergens closed 3 years ago
Merging #532 (25edae0) into main (8d8aa6b) will increase coverage by
0.0%
. The diff coverage is100.0%
.
@@ Coverage Diff @@
## main #532 +/- ##
=====================================
Coverage 93.5% 93.5%
=====================================
Files 47 48 +1
Lines 5228 5257 +29
=====================================
+ Hits 4891 4920 +29
Misses 337 337
Impacted Files | Coverage Δ | |
---|---|---|
pyam/_aggregate.py | 99.0% <100.0%> (+<0.1%) |
:arrow_up: |
pyam/_compare.py | 100.0% <100.0%> (ø) |
|
pyam/core.py | 92.6% <100.0%> (-0.1%) |
:arrow_down: |
tests/conftest.py | 100.0% <100.0%> (ø) |
|
tests/test_feature_aggregate.py | 98.8% <100.0%> (+<0.1%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 8d8aa6b...25edae0. Read the comment docs.
I don't understand why stickler-ci is failing as it doesn't tell me the line where the PR differs from the black code style
You can directly commit my suggestion to your branch directly, this should appease Stickler. Going forward, I recommend that you install a Black linter utility on your machine and follow this workflow:
After a while, you'll not need to run the linter anymore because you'll be coding black naturally...
thanks, that did the job :)
Looking at the code implementation in a bit more detail, two thoughts:
having thought about this feature again, I think it makes more sense to not add an extra kwarg skip_intermediate
but instead do the following:
I'm a bit concerned with the proposed implementation about unexpected behavior if a user has an IamDataFrame with multiple scenarios, where one scenario (scen_a
) has some intermediate variables and another (scen_b
) has only the lower-level components. With the proposed implementation, scen_b
would not be complete.
The (in my opinion expected) behavior could be tested by adding the following line
df2.aggregate("Secondary Energy|Electricity|Wind", append=True)
in the function test_aggregate_recursive()
before appending df2
to df
.
This way, the existing test could be modified and you wouldn't need to add an almost-duplicate test.
I changed the implementation from comparing variables beforehand to comparing indices after aggregating, which should now be able to deal with different variable sets in different scenarios, regions or even years.
Concerning your first thought: I'd prefer to not check internal consistency while skipping intermediate variables but rather let the user check it afterwards. When explicitly using this feature it should be clear that the intermediate variables are not aggregated by pyam and should be checked. Running check_internal_consistency
then gives the user more information about which variables differ and if it's due to rounding errors. In my opinion more useful for potential debugging.
It seems like stickler uses slightly different definitions of black than the built-in black-linter in Spyder.
Thanks for continuing to work on this, sorry for the issues with difference versions of black...
Re your comment:
When explicitly using this feature it should be clear that the intermediate variables are not aggregated by pyam and should be checked.
There is a tradeoff here:
And either way, first running recursive-aggregation and the checking consistency is not an efficient approach, because it requires computing a lot of data twice.
So let me modify my earlier suggestion:
skip_intermediate
keyword argument - skipping should be the default, but with validationrecursive
to be a boolean (as it is now) or a string "skip-validate" - and skip the validation in that case.
So your expert-level function could be called with
df.aggregate(<variable>, recursive="skip-validate")
but novice users would get an error if the hierarchy is not consistent.
In core.py
, you only have to change if recursive is True:
to if recursive:
, then the recursive-aggregation will be selected both if it's True or a dictionary.
Later extensions could then also do something like recursive={rtol=0.2}
where the contents of the dictionary are passed to np.isclose()
in the validation to avoid errors if data is almost identical but not quite.
I implemented my approach for the aggregation check. I will change the interface with dropping the skip_intermediate
keyword after the weekend.
Please confirm that this PR has done the following:
Description of PR
Add an option to skip existing intermediate variables when aggregating recursivly as discussed in #525