danielhuppmann commented 3 years ago

Please confirm that this PR has done the following:

[x] Tests Added
[x] Documentation Added
~Name of contributors Added to AUTHORS.rst~
[x] Description in RELEASE_NOTES.md Added

Description of PR

This PR changes the implementation of the binary ops to support units (via iam-units and pint).

Default behaviour:

Cast values and units to a pint.Quantity using the iam-units registry and perform operations with pint unit handling (ignore_units=False) Warning: By casting units to pints and then performing operations, the return-format may look different. In some of the tests, EJ/yr is transformed to exajoule / year (in full format) or EJ / a (in compact format, implemented here). This is over-ridden for some methods (addition, multiplication, division) if all units are identical.
Passing a pint quantity as argument will work, e.g, df.add("Primary Energy", Quantity(2, "EJ/yr"), "new").
pint will raise appropriate errors if it doesn't know how to handle the units.
In line with common practice in the IAM community, a dimensionless quantity is marked as unit "" (empty string) rather than the word dimensionless (this works in a round-trip to-from-to pint).

Alternative over-ride:

A user can set ignore_units=True, in which case the operations will work just on the values and set unit to "unknown" (per suggestion by @znicholls in https://github.com/IAMconsortium/pyam/issues/536#issuecomment-847569955) or to the value of ignore_units (if it's a string).

Known issues:

pandas fillna does not work with non-numeric values (in particular not with pint.Quantity).

Side note:

I simplified the test for the apply() function.

To-dos in later PRs (it's already quite a beast):

Make the fillna feature work with pint.
Add ignore_units to the apply() function.
Support custom pint registry instances.
Pass unit formatting options to pint as kwargs.

closes #537 closes #535

codecov[bot] commented 3 years ago

Codecov Report

Merging #541 (52101b0) into main (a4c6c71) will increase coverage by 0.0%. The diff coverage is 98.2%.

@@          Coverage Diff          @@
##            main    #541   +/-   ##
=====================================
  Coverage   93.5%   93.5%           
=====================================
  Files         47      47           
  Lines       5167    5228   +61     
=====================================
+ Hits        4832    4891   +59     
- Misses       335     337    +2

Impacted Files	Coverage Δ
pyam/_ops.py	`95.0% <95.7%> (-1.0%)`	:arrow_down:
pyam/core.py	`92.7% <100.0%> (+<0.1%)`	:arrow_up:
pyam/units.py	`92.9% <100.0%> (ø)`
tests/test_ops.py	`100.0% <100.0%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update a4c6c71...52101b0. Read the comment docs.

danielhuppmann commented 3 years ago

pinging @znicholls @gidden @pjuergens @coroa @khaeru - anyone have time for a review or comments on the strategy?

pjuergens commented 3 years ago

So it's just a comment and not a full review. I hope some of the others have time for an in-depth-review, cause I'd really like to use this feature :)

danielhuppmann commented 3 years ago

Thanks for the suggestion, @pjuergens - implemented it and updated the unit tests, because shorter units are generally preferable, I think. Also updated the description of this PR to reflect this change.

However, note that the underlying issue remains, because a user will generally not receive the same notation for a unit - in our case, EJ / yr is now returned as EJ / a. I don't see that as a problem, just something to be aware of...

danielhuppmann commented 3 years ago

Thanks for the comment @znicholls! I had a look at pint-pandas, but as you write, this assumes that an entire column in a dataframe has the same unit - this is not something that pyam satisfies, not even on a per-variable level. A user could currently have a scenario ensemble where "Primary Energy" is in EJ for some scenarios and "GWh" in others. Something to discuss for a pyam v2.0 release...

About speed and casting, I'd follow the strategy I was taught by @gidden - first make it work, then make it fast...

znicholls commented 3 years ago

About speed and casting, I'd follow the strategy I was taught by @gidden - first make it work, then make it fast...

It's a great strategy!

IAMconsortium / pyam

Support units in binary operations #541

Please confirm that this PR has done the following:

Description of PR

Codecov Report