Closed KasiaKoz closed 1 year ago
Found the root of the problem, the operation pd.DataFrame().T.to_dict()
is to blame, in particular, in the version of pandas we're using now it takes a silly amount of time to make a wide DataFrame to a dictionary (but long is actually faster than before).
Below I also test a 'custom rearrange' from a long form .to_dict()
output (d
) to wide form with just a dict comprehension
def _long_wide_dict(d):
return {row_idx: {col: row_val} for col, col_val in d.items() for row_idx, row_val in col_val.items()}
Here are the times:
Python 3.7
+ pandas==1.3.5
DataFrame shape: (50000, 20)
it took 0.00048732757568359375s to `.T`
it took 0.5953989028930664s to `.to_dict()`
it took 2.153273105621338s to `.T.to_dict()`
it took 0.7090747356414795s to custom rearrange `.to_dict()`
Python 3.11
+ pandas==2.1.1
DataFrame shape: (50000, 20)
it took 0.0002789497375488281s to `.T`
it took 0.15897417068481445s to `.to_dict()`
it took 41.112441062927246s to `.T.to_dict()`
it took 0.32709574699401855s to custom rearrange `.to_dict()`
Quite an impact! Could you try with pandas 2.0.3? Pandas 2.1 has some regressions that impact runtimes.
Nice one @brynpickering
Python 3.11 + pandas==2.0.3
DataFrame shape: (50000, 20)
it took 0.00023674964904785156s to `.T`
it took 0.12676477432250977s to `.to_dict()`
it took 1.0371029376983643s to `.T.to_dict()`
it took 0.24923920631408691s to custom rearrange `.to_dict()`
There is a significant increase in time to build
SpatialTree
.The command
intermodal-access-egress-network
with quite detailed Londinium test network spends a long time at the step:Runtimes increate from <1min to ~25min. This test network, while detailed, is still a drop in an ocean compared to what our usual networks look like. This kind of increase in time is prohibitive for us.
Pre python 3.11, it took:
using commit
f112d7c8de52cfd31f1d8623d6429cf2f414dbf3
. It now takes: