IAMconsortium / pyam

Analysis & visualization of energy & climate scenarios
https://pyam-iamc.readthedocs.io/
Apache License 2.0
222 stars 116 forks source link

Add xlsxwriter to improve to_excel performance #701

Closed phackstock closed 1 year ago

phackstock commented 1 year ago

Please confirm that this PR has done the following:

Description of PR

Added xlsxwriter to the dependencies of pyam. pandas uses xlsxwriter over openpyxl if it's found. This means that if xlsxwriter is found on the system it is used without any changes required. According to benchmarks (https://exchangetuts.com/python-fastest-way-to-write-pandas-dataframe-to-excel-on-multiple-sheets-1640154784194443), xlsxwriter is significantly faster than openpyxl. Should I set up some benchmarks of our own to test it for pyam?

codecov[bot] commented 1 year ago

Codecov Report

Merging #701 (a8c77af) into main (759120f) will increase coverage by 0.0%. The diff coverage is 60.0%.

@@          Coverage Diff          @@
##            main    #701   +/-   ##
=====================================
  Coverage   94.8%   94.9%           
=====================================
  Files         58      58           
  Lines       5856    5853    -3     
=====================================
- Hits        5557    5555    -2     
+ Misses       299     298    -1     
Impacted Files Coverage Δ
pyam/core.py 94.7% <33.3%> (-0.2%) :arrow_down:
pyam/utils.py 91.7% <100.0%> (+0.5%) :arrow_up:
tests/test_io.py 100.0% <100.0%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

danielhuppmann commented 1 year ago

I just pip-installed pyam in a clean environment and xlsxwriter was installed automatically already. Not sure if it's necessary to add it as an explicit dependency?

phackstock commented 1 year ago

Interesting, it's not a dependency of pandas (https://github.com/pandas-dev/pandas/blob/main/setup.cfg#L33). Maybe it was installed as a dependency by some other library. I just tried the same, install pyam from a clean virtual environment and it did not install it for me. Maybe it's platform dependent.

danielhuppmann commented 1 year ago

You're right, looks like I was confused between two environments. Running some profiling now.

danielhuppmann commented 1 year ago

There is another usage of pd.ExcelWriter where the engine is not explicitly specified. Please make the two usages consistent.

https://github.com/IAMconsortium/pyam/blob/759120f01453f2738df6d4126186bb0a21cab54b/pyam/core.py#L2377

phackstock commented 1 year ago

Updated the usage of xlsxwriter in pd.ExcelWriter. There's one more thing that I found:

https://github.com/IAMconsortium/pyam/blob/main/pyam/utils.py#L135

is there a test for utils.write_sheet?

danielhuppmann commented 1 year ago

is there a test for utils.write_sheet?

It's tested implicitly via https://github.com/IAMconsortium/pyam/blob/759120f01453f2738df6d4126186bb0a21cab54b/tests/test_io.py#L56

The line you found was a hacky attempt by me to make xlsx files look "nice" by having useful column widths.

phackstock commented 1 year ago

Ok, should I check if using xlsxwriter triggered the error?

danielhuppmann commented 1 year ago

Let me give this a quick try, I think you have more urgent things on your to-do list...

phackstock commented 1 year ago

True, just thought this might be a quick one to get off the list ...