NREL / gdx-pandas

Python interface to read and write GAMS GDX files using pandas.DataFrames as the intermediate data format.
BSD 3-Clause "New" or "Revised" License
43 stars 15 forks source link

SPEED: Speed up convert_np_to_gdx_svs #90

Closed jebob closed 1 year ago

jebob commented 3 years ago

This function was about 1/3rd of write time, but now is fast. In my speed test, the runtime for this function went from 11.071 seconds to <0.1 seconds, reducing the overall runtime from 33.4 to 22.0 seconds.

As convert_np_to_gdx_svs no longer calls is_np_eps, add an additional pytest for this function.

This will have to be revisited if we change the definition of EPS in v2.0

speed test details Code: ```python import gdxpds import profilehooks import pandas @profilehooks.profile def slowthing(size): data = pandas.DataFrame({"i": list(range(size)), "j": list(range(size)), "value": list(range(size))}) gdxpds.to_gdx({"data": data}, "test.gdx") slowthing(2000000) ``` Results before ``` *** PROFILER RESULTS *** slowthing (E:/Projects/gdx-pandas playground/issue63.py:6) function called 1 times 52028554 function calls (52025563 primitive calls) in 33.382 seconds Ordered by: cumulative time, internal time, call count List reduced from 887 to 40 due to restriction <40> ncalls tottime percall cumtime percall filename:lineno(function) 1 0.175 0.175 33.382 33.382 issue63.py:6(slowthing) 1 0.000 0.000 31.840 31.840 write_gdx.py:143(to_gdx) 1 0.000 0.000 31.840 31.840 write_gdx.py:94(save_gdx) 1 0.000 0.000 31.718 31.718 gdx.py:266(write) 2 7.218 3.609 31.536 15.768 gdx.py:833(write) 1 0.005 0.005 11.071 11.071 special.py:114(convert_np_to_gdx_svs) 1 0.001 0.001 11.030 11.030 frame.py:6016(applymap) 1 0.000 0.000 11.029 11.029 frame.py:5837(apply) 1 0.000 0.000 11.028 11.028 apply.py:311(get_result) 1 0.000 0.000 11.028 11.028 apply.py:105(get_result) 1 0.000 0.000 11.028 11.028 apply.py:219(apply_standard) 2 0.039 0.019 11.020 5.510 frame.py:6067(infer) 6 1.746 0.291 10.867 1.811 {pandas._libs.lib.map_infer} 4000000 0.862 0.000 9.121 0.000 special.py:134(convert_approx_eps) 4000000 8.258 0.000 8.258 0.000 special.py:84(is_np_eps) 1 0.000 0.000 5.936 5.936 apply.py:253(apply_series_generator) 1 0.001 0.001 5.086 5.086 {pandas._libs.reduction.reduce} 2005003/2003966 0.842 0.000 2.480 0.000 {built-in method builtins.isinstance} 2000000 0.523 0.000 2.449 0.000 gdxcc.py:458(gdxDataWriteStr) 4000023 1.692 0.000 2.409 0.000 gdx.py:663(num_dims) 2000009 1.270 0.000 2.251 0.000 gdx.py:585(value_cols) 2000000 1.926 0.000 1.926 0.000 {built-in method _gdxcc.gdxDataWriteStr} 4000000 1.000 0.000 1.651 0.000 gdxcc.py:127(__setitem__) 2000081 0.901 0.000 1.637 0.000 abc.py:178(__instancecheck__) 13 0.000 0.000 1.385 0.107 frame.py:334(__init__) 5 0.004 0.001 1.384 0.277 frame.py:426(_init_dict) 5 0.000 0.000 1.369 0.274 frame.py:7349(_arrays_to_mgr) 27 0.000 0.000 1.353 0.050 series.py:4019(_sanitize_array) 5 0.000 0.000 1.353 0.271 frame.py:7644(_homogenize) 3 0.083 0.028 1.352 0.451 cast.py:44(maybe_convert_platform) 2000000 1.202 0.000 1.202 0.000 gdx.py:874() 6 1.102 0.184 1.102 0.184 {pandas._libs.lib.maybe_convert_objects} 2000012 0.530 0.000 0.735 0.000 enum.py:579(__hash__) 2001012 0.731 0.000 0.731 0.000 _weakrefset.py:70(__contains__) 4000000 0.651 0.000 0.651 0.000 {built-in method _gdxcc.doubleArray___setitem__} 4000030 0.421 0.000 0.421 0.000 gdx.py:637(dims) 4001237/4001008 0.297 0.000 0.297 0.000 {built-in method builtins.len} 2000008 0.262 0.000 0.262 0.000 gdx.py:617(file) 2000010 0.248 0.000 0.248 0.000 gdx.py:194(H) 2000026 0.246 0.000 0.246 0.000 gdx.py:521(data_type) ``` Results after ``` *** PROFILER RESULTS *** slowthing (E:/Projects/gdx-pandas playground/issue63.py:6) function called 1 times 44031856 function calls (44028713 primitive calls) in 21.979 seconds Ordered by: cumulative time, internal time, call count List reduced from 1073 to 40 due to restriction <40> ncalls tottime percall cumtime percall filename:lineno(function) 1 0.176 0.176 21.979 21.979 issue63.py:6(slowthing) 1 0.000 0.000 20.406 20.406 write_gdx.py:143(to_gdx) 1 0.000 0.000 20.406 20.406 write_gdx.py:94(save_gdx) 1 0.000 0.000 20.283 20.283 gdx.py:266(write) 2 7.065 3.532 20.135 10.068 gdx.py:833(write) 2005099/2004061 0.845 0.000 2.403 0.000 {built-in method builtins.isinstance} 4000023 1.677 0.000 2.399 0.000 gdx.py:663(num_dims) 2000000 0.492 0.000 2.362 0.000 gdxcc.py:458(gdxDataWriteStr) 2000009 1.313 0.000 2.279 0.000 gdx.py:585(value_cols) 2000000 1.870 0.000 1.870 0.000 {built-in method _gdxcc.gdxDataWriteStr} 4000000 0.936 0.000 1.560 0.000 gdxcc.py:127(__setitem__) 2000078 0.885 0.000 1.558 0.000 abc.py:178(__instancecheck__) 26 0.000 0.000 1.410 0.054 frame.py:334(__init__) 4 0.004 0.001 1.409 0.352 frame.py:426(_init_dict) 4 0.000 0.000 1.396 0.349 frame.py:7349(_arrays_to_mgr) 23 0.000 0.000 1.384 0.060 series.py:4019(_sanitize_array) 4 0.000 0.000 1.383 0.346 frame.py:7644(_homogenize) 3 0.085 0.028 1.382 0.461 cast.py:44(maybe_convert_platform) 2000000 1.170 0.000 1.170 0.000 gdx.py:874() 6 1.147 0.191 1.147 0.191 {pandas._libs.lib.maybe_convert_objects} 2000012 0.535 0.000 0.744 0.000 enum.py:579(__hash__) 2001015 0.668 0.000 0.668 0.000 _weakrefset.py:70(__contains__) 4000000 0.624 0.000 0.624 0.000 {built-in method _gdxcc.doubleArray___setitem__} 4000030 0.420 0.000 0.420 0.000 gdx.py:637(dims) 4001454/4001193 0.302 0.000 0.303 0.000 {built-in method builtins.len} 2000010 0.279 0.000 0.279 0.000 gdx.py:194(H) 2000008 0.247 0.000 0.247 0.000 gdx.py:617(file) 2000026 0.222 0.000 0.222 0.000 gdx.py:521(data_type) 2000045/2000039 0.209 0.000 0.209 0.000 {built-in method builtins.hash} 1 0.000 0.000 0.154 0.154 gdxcc.py:438(gdxDataWriteDone) 1 0.154 0.154 0.154 0.154 {built-in method _gdxcc.gdxDataWriteDone} 18 0.135 0.007 0.152 0.008 cast.py:1207(construct_1d_object_array_from_listlike) 1 0.000 0.000 0.147 0.147 gdxcc.py:370(gdxClose) 1 0.147 0.147 0.147 0.147 {built-in method _gdxcc.gdxClose} 1 0.000 0.000 0.129 0.129 frame.py:778(itertuples) 11 0.000 0.000 0.129 0.012 base.py:912(__iter__) 11 0.000 0.000 0.129 0.012 base.py:893(tolist) 11 0.128 0.012 0.128 0.012 {method 'tolist' of 'numpy.ndarray' objects} 2 0.000 0.000 0.123 0.062 write_gdx.py:86(gdx) 1 0.000 0.000 0.098 0.098 gdx.py:142(__init__) ```

Tests

Closes #63

jebob commented 1 year ago

Alas, I no longer have a working GAMS install and can't test/fix this. The core technique of vectorising the replace and the EPS check should work if you wanted to fix or to re-implement it.

elainethale commented 1 year ago

Reimplemented in #96