Open i-am-sijia opened 2 months ago
Used num_processes: 28
on a 512 GB RAM machine with 32 physical cores. Did two runs on June 13, 2024. The only difference between the two runs was the version of Sharrow. One uses v2.9.1, the other uses a later version with np.where updates. More details please see rows 16 and 17 in RunMatrix_PerformanceResults.xls.
The np.where updates in Sharrow main@8d63a66 does not seem to help run time in multiprocssing.
Did an analogous run using num_processes: 20
of the 24 processors on an RSG machine with 500 GB RAM and 2.1 GHz Intel Xeon cores. Used the latest sharrow code (main@8d63a66) and completed in 289 mins = 4.81 hours.
sh_mp_full_logs.zip
Notably on this machine the single process time took 21.1 hours which is significantly longer than the single process run time for Sijia's above run.
Ran on SFCTA server,
num_processes: 8
(machine has 160 cores)NUMBA_NUM_THREADS: 4
Total runtime 239.7 minutes (i.e. just under 4 hours)
Ran the model on an RSG machine with 24 cores and 500 GB of RAM with the following settings:
And varied the number of cores to see what the runtime improvements are:
Observations:
The results here are very consistent with the observations in the MTC model (see https://github.com/ActivitySim/activitysim-prototype-mtc/issues/12#issuecomment-2218312707). The main difference was that here the runtime minimum was with 20 cores, but with the MTC example it was around 10 cores.
This is the issue to report on memory usage and runtime performance...
data_dir: "data-full"
full scale skims (24333 MAZs)households_sample_size: 0
(full scale 100% sample of households)sharrow: require
multiprocess: True