ActivitySim / sandag-abm3-example

BSD 3-Clause "New" or "Revised" License
0 stars 2 forks source link

Full Scale Performance: Multi-Process, Sharrow On #22

Open i-am-sijia opened 2 months ago

i-am-sijia commented 2 months ago

This is the issue to report on memory usage and runtime performance...

i-am-sijia commented 2 months ago

Used num_processes: 28 on a 512 GB RAM machine with 32 physical cores. Did two runs on June 13, 2024. The only difference between the two runs was the version of Sharrow. One uses v2.9.1, the other uses a later version with np.where updates. More details please see rows 16 and 17 in RunMatrix_PerformanceResults.xls.

The np.where updates in Sharrow main@8d63a66 does not seem to help run time in multiprocssing.

dhensle commented 2 months ago

Did an analogous run using num_processes: 20 of the 24 processors on an RSG machine with 500 GB RAM and 2.1 GHz Intel Xeon cores. Used the latest sharrow code (main@8d63a66) and completed in 289 mins = 4.81 hours. sh_mp_full_logs.zip

Notably on this machine the single process time took 21.1 hours which is significantly longer than the single process run time for Sijia's above run.

jpn-- commented 2 months ago

Ran on SFCTA server,

Total runtime 239.7 minutes (i.e. just under 4 hours)

Archive-SFCTAserver-4thread-8MP.zip

dhensle commented 1 month ago

Ran the model on an RSG machine with 24 cores and 500 GB of RAM with the following settings:

And varied the number of cores to see what the runtime improvements are: image

Observations:

The results here are very consistent with the observations in the MTC model (see https://github.com/ActivitySim/activitysim-prototype-mtc/issues/12#issuecomment-2218312707). The main difference was that here the runtime minimum was with 20 cores, but with the MTC example it was around 10 cores.