Open jpn-- opened 6 months ago
While I've made these updates and all the regular CI tests pass (i.e. the results look correct), I have discovered the change to pandas 2.x incurs a significant runtime penalty when running without sharrow.
non-sharrow test timings for pandas 1.x:
58.60s call activitysim/examples/prototype_mtc/test/test_mtc.py::test_mtc_mp
53.71s call activitysim/examples/prototype_mtc/test/test_mtc.py::test_mtc
53.66s call activitysim/examples/prototype_mtc/test/test_mtc.py::test_mtc_chunkless
53.23s call activitysim/examples/prototype_mtc/test/test_mtc.py::test_mtc_recode
non-sharrow test timings for pandas 2.x:
148.50s call activitysim/examples/prototype_mtc/test/test_mtc.py::test_mtc
148.14s call activitysim/examples/prototype_mtc/test/test_mtc.py::test_mtc_chunkless
147.83s call activitysim/examples/prototype_mtc/test/test_mtc.py::test_mtc_recode
140.09s call activitysim/examples/prototype_mtc/test/test_mtc.py::test_mtc_mp
It will require some research to figure out why this is happening, and whether it can be solved relatively easily... or at all. Initial profiling suggests the problem is in pandas.core.internals.managers.BlockManager.get_dtypes
, which is getting called from df.eval
, but we almost certainly do not want to mess around with pandas internals.
Addresses #794.
The update from pandas 1.x to 2.x introduces a number of small but material changes that affect ActivitySim:
Index
objects are all one class with different datatypes, instead of being different classes (e.g. there is no moreInt64Index
class).read_csv
function by default now interprets "None" as a missing value (i.e. NaN) instead of being the Python objectNone
.groupby
operation, when applied to categorical data, now sorts the categories in the result unless told not to (resulting in different order of rows in outputs for some operations).df.join()
also potentially sorts the resulting rows differently unless an explicitsort
argument is given.Index
objects no longer can be checked asis_monotonic
but instead needis_monotonic_increasing
.