SEMCOG ActivitySim Runtime Performance Issues with various Chuck settings

JilanChen commented 1 year ago

Machine Specs: AMD® Ryzen™ Threadripper™ Pro 5995WX Processor, 64-core / 128-thread @ 2.7GHz, 512 GB RAM Total ActivitySim Run Time (summary of timing_log) with 100% samples (1.9 million households), two-zone system (3k zones & 28k micro-zones): 20 processors and 200 GB RAM: 557 minutes 40 processors and 400 GB RAM: 850 minutes 60 processors and 450 GB RAM: 1080 minutes

Using new training from this machine doesn't help either. 20 processors and 200 GB RAM runtime seems consistent with the runtime in another intel machine. Any tips on anything wrong with my settings or AMD processors?

bettinardi commented 1 year ago

A quick note that we found something similar several years ago with our statewide model - more processors actually slowed down our overall run time significantly. What we found was that we were not optimatized to take advantage of that many processors, so the issue was two fold; on one side we were losing speed because as you add processors you usually lose speed (GHZ) for each individual processor; 2) we were losing time in overhead to distribute inefficiently across more processors than we were setup to use. We tested many configurations and found that the sweet spot for us was between 8 and 16 processors. We turned around and bought the best (highest speed) processors we could with 8 processors (16 threads). This also saved us significant dollars. The fast 8 processors were about $3500 (4-5 years ago). This is in comparison to the 20 processor machines we had to return (because of how slow they were for our work) at $7500

guyrousseau commented 1 year ago

At ARC for ABM modeling, one of our server's systems contains 64 cores of processor (so 128 threads), 512 GB of RAM, a mix of SSD and Serial-Attached SCSI solid-state drives, and plenty of room to grow as needed. That one is a Dell PowerEdge R840 Server with Intel Xeon Gold 6130 processors (no AMDs at ARC). Overnight ABM run-times have been manageable and somewhat satisfactory thus far, though we're no longer investing in servers anymore, all cloud-based computing solutions (mostly AWS) from this point forward, with a greater emphasis on that trajectory in 2023 and beyond.

JilanChen commented 1 year ago

Thanks for the tips! I'll test 20 processors and 400 GB RAM next for the purpose of comparison.

JilanChen commented 1 year ago

That's our first time to try out AMD since it's much cheaper comparing to the same specs of Intel.

jpn-- commented 1 year ago

Counterintuitively (although based on the evidence above, not actually...) I would suggest also testing with even fewer processors. I suspect you are seeing the results you are because you are flooding the CPU cache, as ActivitySim is very memory-intensive.

JilanChen commented 1 year ago

Counterintuitively (although based on the evidence above, not actually...) I would suggest also testing with even fewer processors. I suspect you are seeing the results you are because you are flooding the CPU cache, as ActivitySim is very memory-intensive.

Yes, it's in my plan too.

stefancoe commented 1 year ago

@JilanChen Can you share the timing_log.csv for a couple of these runs? It would be interesting to see the run times broken down by model step.

JilanChen commented 1 year ago

Another AMD machine in SEMCOG (with similar specs) seems running much faster than the one I reported. RSG is going to look into that machine to see if there are any problems with settings. Once that is done, I will put the run times together in these machines and share with the group.

guyrousseau commented 1 year ago

As an additional point of comparison, below is the ARC timing_log.csv output file from a 1.0.4 version ActivitySim baseline / base-year model run. Destination choice and scheduling models are the ones taking the most time, relatively speaking.

process_name	model_name	seconds	minutes
mp_initialize_landuse	initialize_landuse	0.9	0
mp_accessibility_0	compute_accessibility	8	0.1
mp_initialize_households	initialize_households	148.1	2.5
mp_households_0	school_location	506.5	8.4
mp_households_0	workplace_location	2043	34
mp_households_0	auto_ownership_simulate	14.4	0.2
mp_households_0	free_parking	12.5	0.2
mp_households_0	cdap_simulate	69.4	1.2
mp_households_0	mandatory_tour_frequency	21.6	0.4
mp_households_0	mandatory_tour_scheduling	6832.4	113.9
mp_households_0	joint_tour_frequency	28.8	0.5
mp_households_0	joint_tour_composition	22	0.4
mp_households_0	joint_tour_participation	38.8	0.6
mp_households_0	joint_tour_destination	355.4	5.9
mp_households_0	joint_tour_scheduling	926.6	15.4
mp_households_0	non_mandatory_tour_frequency	829	13.8
mp_households_0	non_mandatory_tour_destination	1341	22.3
mp_households_0	non_mandatory_tour_scheduling	7769	129.5
mp_households_0	tour_mode_choice_simulate	201.7	3.4
mp_households_0	atwork_subtour_frequency	39.6	0.7
mp_households_0	atwork_subtour_destination	474.9	7.9
mp_households_0	atwork_subtour_scheduling	465.2	7.8
mp_households_0	atwork_subtour_mode_choice	49.6	0.8
mp_households_0	stop_frequency	176.1	2.9
mp_households_0	trip_purpose	37.5	0.6
mp_households_0	trip_destination	13762.6	229.4
mp_households_0	trip_purpose_and_destination	107.9	1.8
mp_households_0	trip_scheduling_choice	343.5	5.7
mp_households_0	trip_departure_choice	146	2.4
mp_households_0	trip_mode_choice	315.1	5.3
mp_households_0	parking_location	71.9	1.2
mp_households_0	write_data_dictionary	76.1	1.3
mp_households_0	track_skim_usage	9.1	0.2
mp_summarize	write_trip_matrices	521.8	8.7
mp_summarize	write_tables	749.9	12.5

JilanChen commented 1 year ago

Destination

Guy, is your

As an additional point of comparison, below is the ARC timing_log.csv output file from a 1.0.4 version ActivitySim baseline / base-year model run. Destination choice and scheduling models are the ones taking the most time, relatively speaking.

process_name model_name seconds minutes mp_initialize_landuse initialize_landuse 0.9 0 mp_accessibility_0 compute_accessibility 8 0.1 mp_initialize_households initialize_households 148.1 2.5 mp_households_0 school_location 506.5 8.4 mp_households_0 workplace_location 2043 34 mp_households_0 auto_ownership_simulate 14.4 0.2 mp_households_0 free_parking 12.5 0.2 mp_households_0 cdap_simulate 69.4 1.2 mp_households_0 mandatory_tour_frequency 21.6 0.4 mp_households_0 mandatory_tour_scheduling 6832.4 113.9 mp_households_0 joint_tour_frequency 28.8 0.5 mp_households_0 joint_tour_composition 22 0.4 mp_households_0 joint_tour_participation 38.8 0.6 mp_households_0 joint_tour_destination 355.4 5.9 mp_households_0 joint_tour_scheduling 926.6 15.4 mp_households_0 non_mandatory_tour_frequency 829 13.8 mp_households_0 non_mandatory_tour_destination 1341 22.3 mp_households_0 non_mandatory_tour_scheduling 7769 129.5 mp_households_0 tour_mode_choice_simulate 201.7 3.4 mp_households_0 atwork_subtour_frequency 39.6 0.7 mp_households_0 atwork_subtour_destination 474.9 7.9 mp_households_0 atwork_subtour_scheduling 465.2 7.8 mp_households_0 atwork_subtour_mode_choice 49.6 0.8 mp_households_0 stop_frequency 176.1 2.9 mp_households_0 trip_purpose 37.5 0.6 mp_households_0 trip_destination 13762.6 229.4 mp_households_0 trip_purpose_and_destination 107.9 1.8 mp_households_0 trip_scheduling_choice 343.5 5.7 mp_households_0 trip_departure_choice 146 2.4 mp_households_0 trip_mode_choice 315.1 5.3 mp_households_0 parking_location 71.9 1.2 mp_households_0 write_data_dictionary 76.1 1.3 mp_households_0 track_skim_usage 9.1 0.2 mp_summarize write_trip_matrices 521.8 8.7 mp_summarize write_tables 749.9 12.5

Thanks, Guy. Your total ActivitySim run time seems over 10 hours also. SEMCOG needs 4 iterations to get the traffic flow to converge. SEMCOGS's work_location model takes the longest time (2-zone with Shadow prince on).

JilanChen commented 1 year ago

Below is the SEMCOG timing_log.csv from three computers.

ActivitySim / activitysim

SEMCOG ActivitySim Runtime Performance Issues with various Chuck settings #628