ActivitySim / activitysim

An Open Platform for Activity-Based Travel Modeling
https://activitysim.github.io
BSD 3-Clause "New" or "Revised" License
191 stars 99 forks source link

Multiprocessing error during accessibility in cleaning.py when using mapped values. #652

Closed stefancoe closed 5 days ago

stefancoe commented 1 year ago

Error occurs on line 72. The problem is that the function argument ‘values’ is a one column DF representing all MAZ ids but the remapper variable on line 71 is built from a subset of rows from base_df, opened from the pipeline specific to the current process when running in multiprocessing mode. For example, when running with num_processes = 2, source_ids and base_df.index have 1/2 the number of records so 'remapper' will only have 1/2 the total mappings. Since 'values' contains all MAZs, the code on line 72 fails because 'remapper' is missing the full set of keys.

@jpn-- I wonder if we should update PSRC data/UECs that are used for testing as our most current setup may contain specs/inputs that are not currently tested by any of the examples, as noted here. I recently made the necessary changes to get it to compile and run with sharrow, which cut the run time of our test example from 504 to 156 seconds. I am anxious to see how it runs on the full set of households for the entire region with multiprocessing.

log: mp_accessibility_0-activitysim.log

jpn-- commented 1 year ago

We should definitely update the PSRC example model to be consistent with something that looks like what you are actually working with. I can just pull from https://github.com/psrc/psrc_activitysim/tree/main/configs_dev if you are comfortable with that.

Also, the "data" in that repo still has just a single MAZ pair in the maz_to_maz_*.csv files. If there's a more robust set of data files that will better exercise all relevant two-zone-ness features of your two-zone model, that would be great.

stefancoe commented 1 year ago

@jpn-- I just updated the data folder to reflect what we are currently testing. I also added folders for sharrow configs. Thanks!

https://github.com/psrc/psrc_activitysim/tree/main/data

wusun2 commented 1 year ago

@stefancoe and @jpn-- , I am curious does this affect our plan of moving forward with Sharrow? It seems PRSC got significant runtime performance improvement with Sharrow on your test example, we'd like to start looking into plugging Sharrow in the BayDAG implementation.

jpn-- commented 1 year ago

This error has been addressed by fixing two things, one in ActivitySim (in #654) and one in the PSRC model configs here https://github.com/psrc/psrc_activitysim/pull/38.

Before I am willing to declare it "fixed" and close this issue I want to make sure there's a sufficient test that exercises a large enough part of the PSRC model that we actually will see errors when it's not working.