When adding a symbol to the Gdx object, insertion to an arbitrary position while preserving orderedDict seems to be O(n) complexity, but appending is faster. When loading a large gdx this is much faster.
Switch to generator + list comprehesion when reading values instead of doing lots of appending
No longer convert data to "True" while reading if set, this is handled during assigning to self.dataframe in _fixup_set_value
General speedups:
Tests:
Test design
Extract gdx from [input_gdx.zip](https://github.com/NREL/gdx-pandas/files/5593039/input_gdx.zip) ```python import gdxpds import profilehooks @profilehooks.profile def read_big(): x = gdxpds.to_dataframes("big.gdx") # 1 symbol, 3000x3000 = 9 million elements @profilehooks.profile def roundtrip_many(): x = gdxpds.to_dataframes("many.gdx") # 1024 symbols, 10x10 = 100 elements each gdxpds.to_gdx(x, "many_out.gdx") read_big() roundtrip_many() ```Test time before 98.4 seconds
``` *** PROFILER RESULTS *** roundtrip_many (E:/Projects/gdx-pandas playground/speed.py:10) function called 1 times 41469697 function calls (40927215 primitive calls) in 54.715 seconds Ordered by: cumulative time, internal time, call count List reduced from 924 to 40 due to restriction <40> ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 54.715 54.715 speed.py:10(roundtrip_many) 1 0.000 0.000 27.737 27.737 write_gdx.py:143(to_gdx) 1 0.000 0.000 27.737 27.737 write_gdx.py:94(save_gdx) 1 0.000 0.000 26.978 26.978 read_gdx.py:105(to_dataframes) 1 0.000 0.000 26.781 26.781 read_gdx.py:49(__init__) 1 0.019 0.019 26.693 26.693 gdx.py:223(read) 2168 0.020 0.000 25.468 0.012 _collections_abc.py:966(append) 2168 0.262 0.000 25.444 0.012 gdx.py:327(insert) 2168 0.540 0.000 25.174 0.012 gdx.py:330(Test time after 48.4 seconds
``` C:\Python36\python.exe "E:/Projects/gdx-pandas playground/speed.py" *** PROFILER RESULTS *** roundtrip_many (E:/Projects/gdx-pandas playground/speed.py:10) function called 1 times 33560833 function calls (33016183 primitive calls) in 28.190 seconds Ordered by: cumulative time, internal time, call count List reduced from 923 to 40 due to restriction <40> ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 28.205 28.205 speed.py:10(roundtrip_many) 23851 0.123 0.000 17.015 0.001 frame.py:334(__init__) 6507 0.031 0.000 16.293 0.003 gdx.py:641(dims) 6507 0.118 0.000 16.150 0.002 frame.py:426(_init_dict) 5423 0.046 0.000 15.854 0.003 gdx.py:737(_init_dataframe) 1 0.000 0.000 14.471 14.471 write_gdx.py:143(to_gdx) 1 0.000 0.000 14.470 14.470 write_gdx.py:94(save_gdx) 1 0.000 0.000 13.734 13.734 read_gdx.py:105(to_dataframes) 1 0.000 0.000 13.522 13.522 read_gdx.py:49(__init__) 1 0.016 0.016 13.423 13.423 gdx.py:223(read) 4336 0.090 0.000 9.570 0.002 gdx.py:671(dataframe) 2171 0.025 0.000 9.484 0.004 gdx.py:454(__init__) 1 0.005 0.005 7.273 7.273 gdx.py:266(write) 2 0.002 0.001 7.197 3.598 write_gdx.py:86(gdx) 1085 0.449 0.000 7.114 0.007 gdx.py:833(write) 1084 0.012 0.000 7.106 0.007 write_gdx.py:99(__add_symbol_to_gdx) 1084 0.021 0.000 6.913 0.006 gdx.py:799(load) 133399 0.115 0.000 4.850 0.000 base.py:4914(_ensure_index) 7591 0.023 0.000 4.680 0.001 frame.py:7349(_arrays_to_mgr) 5423 0.022 0.000 4.427 0.001 indexing.py:182(__setitem__) 1084 0.025 0.000 4.187 0.004 special.py:114(convert_np_to_gdx_svs) 40137/34713 0.264 0.000 3.934 0.000 series.py:166(__init__) 16270 0.298 0.000 3.628 0.000 {pandas._libs.lib.clean_index_list} 5423 0.017 0.000 3.302 0.001 indexing.py:152(_get_setitem_indexer) 5423 0.045 0.000 3.265 0.001 indexing.py:1225(_convert_to_indexer) 65081/16271 0.350 0.000 2.972 0.000