IAMconsortium / pyam

Analysis & visualization of energy & climate scenarios
https://pyam-iamc.readthedocs.io/
Apache License 2.0
221 stars 115 forks source link

Add fast-path to format data #731

Closed coroa closed 1 year ago

coroa commented 1 year ago

Co-authored-by: Matthew Gidden matthew.gidden@gmail.com

Please confirm that this PR has done the following:

Description of PR

Add a fast-path to format_data for initialization with a multi-index based Series or DataFrame that has all the required columns.

~I set the base branch for this PR to PR #730 to highlight the small additional changes necessary.~

codecov[bot] commented 1 year ago

Codecov Report

Merging #731 (ab6b32e) into main (e07d3b9) will decrease coverage by 0.1%. The diff coverage is 97.4%.

@@           Coverage Diff           @@
##            main    #731     +/-   ##
=======================================
- Coverage   95.0%   95.0%   -0.1%     
=======================================
  Files         59      59             
  Lines       6020    6037     +17     
=======================================
+ Hits        5725    5741     +16     
- Misses       295     296      +1     
Impacted Files Coverage Δ
pyam/utils.py 92.7% <97.4%> (+<0.1%) :arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

gidden commented 1 year ago

Hey @coroa - I hope I didn't clobber your commits here by merging your previous PR first. Could you rebase this one on main and then I can provide a review? Thanks!

coroa commented 1 year ago

Looks good to me! If I understand it correctly, the way to access the fast-pass is to set the index before initialization, right?

Indeed. (or even better keep it as it is, ie never reset it :))

coroa commented 1 year ago

Rebased to new main. Good to merge from my side.

danielhuppmann commented 1 year ago

Sorry, my earlier comment was badly phrased... What I meant was the following:

In #726, @gidden added an option fast=False to the IamDataFrame initialization to explicitly instruct pyam to use the fast-pass (skip some validations) - now, this is implicit. Which means the fast-pass will automatically be applied by any method using _finalize(append=False) (see here) including aggregation and algebraic operations - but it is not possible to use the fast-pass when initializing from a file (because pandas reads a dataframe).

I think that this is perfectly fine behavior - just wanted to highlight this (or stand corrected if I'm on the wrong track).

Fine to merge (and maybe add a "force-fast-pass"-arg later). Thanks!

coroa commented 1 year ago

Sorry, my earlier comment was badly phrased... What I meant was the following:

In #726, @gidden added an option fast=False to the IamDataFrame initialization to explicitly instruct pyam to use the fast-pass (skip some validations) - now, this is implicit. Which means the fast-pass will automatically be applied by any method using _finalize(append=False) (see here) including aggregation and algebraic operations - but it is not possible to use the fast-pass when initializing from a file (because pandas reads a dataframe).

I think that this is perfectly fine behavior - just wanted to highlight this (or stand corrected if I'm on the wrong track).

Fine to merge (and maybe add a "force-fast-pass"-arg later). Thanks!

You are spot on. The fast-path is not improving file read-in speed (as-is), but only data passing within pyam and pandas, where the index is preserved, like with the __finalize__ calls you are highlighting.