ActivitySim / activitysim

An Open Platform for Activity-Based Travel Modeling
https://activitysim.github.io
BSD 3-Clause "New" or "Revised" License
195 stars 99 forks source link

Vehicle type model runtime #787

Closed jpn-- closed 9 months ago

jpn-- commented 10 months ago

When running the MTC extended model with full size zones and full size population, in the non-Sharrow mode, the model failed in the vehicle type choice model with a memory error on a Windows machine with 512 GB RAM; in the Sharrow mode, the model completed the vehicle type choice model in 17 hours.

Need to improve:

dhensle commented 10 months ago

Some initial first steps:

dhensle commented 9 months ago

Config updates: https://github.com/ActivitySim/activitysim-prototype-mtc/pull/3 Code updates: https://github.com/ActivitySim/activitysim/pull/806

i-am-sijia commented 9 months ago

I looked into the memory usage of vehicle type model in the non-Sharrow mode. When running MTC extended model with 25% population, the interaction_df (the joined data frame of choosers and alternatives) of the first vehicle choice uses 212 GB of RAM, which explains why we got a memory error when running 100% population.

Below is a table of memory taken by each column in the interaction_df. The string columns are already converted to pandas categorical. No column stands out as being memory intensive, it's just that there are too many columns in this table and it adds up. Removing columns that are not used in the utility calculation will help reducing memory.

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

Column | Dtype | Memory (GB) -- | -- | -- Total |   | 212.1 index | int64 | 2.4 body_type_Car | uint8 | 0.3 body_type_Motorcycle | uint8 | 0.3 body_type_Pickup | uint8 | 0.3 body_type_SUV | uint8 | 0.3 body_type_Van | uint8 | 0.3 age_1 | uint8 | 0.3 age_10 | uint8 | 0.3 age_11 | uint8 | 0.3 age_12 | uint8 | 0.3 age_13 | uint8 | 0.3 age_14 | uint8 | 0.3 age_15 | uint8 | 0.3 age_16 | uint8 | 0.3 age_17 | uint8 | 0.3 age_18 | uint8 | 0.3 age_19 | uint8 | 0.3 age_2 | uint8 | 0.3 age_20 | uint8 | 0.3 age_3 | uint8 | 0.3 age_4 | uint8 | 0.3 age_5 | uint8 | 0.3 age_6 | uint8 | 0.3 age_7 | uint8 | 0.3 age_8 | uint8 | 0.3 age_9 | uint8 | 0.3 fuel_type_BEV | uint8 | 0.3 fuel_type_Diesel | uint8 | 0.3 fuel_type_Gas | uint8 | 0.3 fuel_type_Hybrid | uint8 | 0.3 fuel_type_PEV | uint8 | 0.3 body_type | category | 0.3 age | int32 | 1.2 fuel_type | category | 0.3 vehicle_year | int64 | 2.4 NumMakes | int64 | 2.4 NumModels | int64 | 2.4 MPG | float64 | 2.4 Range | int64 | 2.4 NewPrice | float64 | 2.4 auto_operating_cost | float64 | 2.4 co2gpm | float64 | 2.4 vehicle_type | category | 0.6 household_id | int64 | 2.4 vehicle_num | int64 | 2.4 home_zone_id | int64 | 2.4 income | int64 | 2.4 hhsize | int64 | 2.4 HHT | int64 | 2.4 auto_ownership | int32 | 1.2 num_workers | int64 | 2.4 sample_rate | float64 | 2.4 income_in_thousands | float64 | 2.4 income_segment | int32 | 1.2 median_value_of_time | float64 | 2.4 hh_value_of_time | float64 | 2.4 num_non_workers | int64 | 2.4 num_drivers | int8 | 0.3 num_adults | int8 | 0.3 num_children | int8 | 0.3 num_young_children | int8 | 0.3 num_children_5_to_15 | int8 | 0.3 num_children_16_to_17 | int8 | 0.3 num_college_age | int8 | 0.3 num_young_adults | int8 | 0.3 non_family | bool | 0.3 family | bool | 0.3 home_is_urban | bool | 0.3 home_is_rural | bool | 0.3 hh_work_auto_savings_ratio | float32 | 1.2 DISTRICT | int64 | 2.4 SD | int64 | 2.4 county_id | int64 | 2.4 TOTHH | int64 | 2.4 TOTPOP | int64 | 2.4 TOTACRE | float64 | 2.4 RESACRE | float64 | 2.4 CIACRE | float64 | 2.4 TOTEMP | int64 | 2.4 AGE0519 | int64 | 2.4 RETEMPN | int64 | 2.4 FPSEMPN | int64 | 2.4 HEREMPN | int64 | 2.4 OTHEMPN | int64 | 2.4 AGREMPN | int64 | 2.4 MWTEMPN | int64 | 2.4 PRKCST | float64 | 2.4 OPRKCST | float64 | 2.4 area_type | int64 | 2.4 HSENROLL | float64 | 2.4 COLLFTE | float64 | 2.4 COLLPTE | float64 | 2.4 TOPOLOGY | int64 | 2.4 TERMINAL | float64 | 2.4 household_density | float64 | 2.4 employment_density | float64 | 2.4 density_index | float64 | 2.4 is_cbd | bool | 0.3 TOTENR_univ | float64 | 2.4 ext_work_share | float64 | 2.4 RETEMPN_scaled | float64 | 2.4 FPSEMPN_scaled | float64 | 2.4 HEREMPN_scaled | float64 | 2.4 OTHEMPN_scaled | float64 | 2.4 AGREMPN_scaled | float64 | 2.4 MWTEMPN_scaled | float64 | 2.4 TOTEMP_scaled | float64 | 2.4 auPkRetail | float64 | 2.4 auPkTotal | float64 | 2.4 auOpRetail | float64 | 2.4 auOpTotal | float64 | 2.4 trPkRetail | float64 | 2.4 trPkTotal | float64 | 2.4 trOpRetail | float64 | 2.4 trOpTotal | float64 | 2.4 nmRetail | float64 | 2.4 nmTotal | float64 | 2.4 already_owned_veh | category | 0.6 total_hh_dist_to_work | float32 | 1.2 total_hh_dist_to_work_cap | float64 | 2.4 avg_hh_dist_to_work | float32 | 1.2 hh_per_mi | float64 | 2.4 hh_veh_gt_drivers | int32 | 1.2 num_hh_veh_owned | float64 | 2.4 num_hh_Van | float64 | 2.4 num_hh_SUV | float64 | 2.4 num_hh_Pickup | float64 | 2.4 num_hh_Motorcycle | float64 | 2.4 num_hh_Hybrid | float64 | 2.4 num_hh_BEV | float64 | 2.4 num_hh_PEV | float64 | 2.4 num_hh_EV | float64 | 2.4

dhensle commented 9 months ago

Thanks Sijia, I had put options to specify which columns to keep in the choosers and alts table already in the PR's listed in my previous message.

joecastiglione commented 9 months ago

Is it possible to systematically, across all models and tables, include only those columns that are used in the utility calculations?

dhensle commented 9 months ago

Is it possible to systematically, across all models and tables, include only those columns that are used in the utility calculations?

Yes, we had brought this up a week or two ago and created this issue: https://github.com/ActivitySim/activitysim/issues/792

i-am-sijia commented 9 months ago

Thanks Sijia, I had put options to specify which columns to keep in the choosers and alts table already in the PR's listed in my previous message.

Yep, I saw that. I wanted to provide more context and evidence that dropping unused columns will help.