IAMconsortium / pyam

Analysis & visualization of energy & climate scenarios
https://pyam-iamc.readthedocs.io/
Apache License 2.0
221 stars 115 forks source link

Improve performance of `format_data()` #729

Closed danielhuppmann closed 1 year ago

danielhuppmann commented 1 year ago

Please confirm that this PR has done the following:

Description of PR

This PR is an alternative approach to #726 & #727 only focusing on performance improvements in format_data() by switching from pd.melt() to stack().

Performance improvements

Current implementation

      ITEM_PATH      ITEM_VARIANT  TOTAL_TIME  CPU_USAGE    MEM_USAGE
0  test_profile  test_init[data0]    0.673276   0.203255    -1.433594
1  test_profile  test_init[data1]    9.140325   0.931676  1389.832031
2  test_profile  test_init[data2]   29.698939   0.932599  3407.539062

New implementation

      ITEM_PATH      ITEM_VARIANT  TOTAL_TIME  CPU_USAGE    MEM_USAGE
0  test_profile  test_init[data0]    0.786510   0.167053    -0.167969
1  test_profile  test_init[data1]    4.425207   0.850163   631.160156
2  test_profile  test_init[data2]   12.178936   0.942462  1482.765625
codecov[bot] commented 1 year ago

Codecov Report

Merging #729 (4b6207c) into main (8c56dc3) will increase coverage by 0.0%. The diff coverage is 100.0%.

:exclamation: Current head 4b6207c differs from pull request most recent head 3fb9bd6. Consider uploading reports for the commit 3fb9bd6 to get more accurate results

@@          Coverage Diff          @@
##            main    #729   +/-   ##
=====================================
  Coverage   95.0%   95.0%           
=====================================
  Files         59      59           
  Lines       6004    6015   +11     
=====================================
+ Hits        5707    5718   +11     
  Misses       297     297           
Impacted Files Coverage Δ
pyam/utils.py 91.9% <100.0%> (+0.2%) :arrow_up:
tests/test_core.py 100.0% <100.0%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

gidden commented 1 year ago

Thanks @danielhuppmann - I'll let @coroa finish off the review. In general I am happy especially if we minimize the read-in times of large AR6-like files as much as possible.

danielhuppmann commented 1 year ago

Thanks @coroa - fully agree with your qualms, happy to take this further and implement more of your suggestions! (Or even happier to review/merge this PR and you implement your suggestions on top of my changes). I was just worried that the current path-dependency of the existing PRs made it quite hard to review - and might make format_data() even more complicated and hard to maintain going forward.

coroa commented 1 year ago

I'm fine with merging this PR. (That's why i clicked Approve :))

The announced PRs are #730 and #731 .