UDST / pandana

Pandas Network Analysis by UrbanSim: fast accessibility metrics and shortest paths, using contraction hierarchies :world_map:
http://udst.github.io/pandana
GNU Affero General Public License v3.0
382 stars 84 forks source link

Multithreading not working in certain Windows environments #138

Open smmaurer opened 4 years ago

smmaurer commented 4 years ago

UrbanSim Inc received a report from a user of very slow simulation performance in Windows. We profiled the model locally on a Windows machine and a Mac. We were able to reproduce the issue, and it seems to be related to Pandana multithreading.

The Windows machine reports that 8 threads are being used, but operations perform as if they are single-threaded. Big thanks to @jessicacamacho for the profiling work.

This issue documents what we know about the problem.

Initial diagnosis

One model iteration ran in 15 min on the Mac (8 GB RAM, 2-core) but took 70 minutes in Windows (16 GB RAM, 4-core). Pandana status messages on both machines indicated that multithreading was active. Runs are profiled here:

Pandana accessibility calculations make up the majority of the excess time.

Filtering for Pandana calls, three functions are used: get_all_aggregate_accessibility_variables, precompute_range, and initialize_access_var. The first two are supposed to be multithreaded, and take 5x to 7x longer on the Windows machine. The last function is not multithreaded, and has similar performance on both machines. Other numerical operations, like Pandas function calls, also perform similarly on both machines.

So, it seems like the Pandana multithreading is not working correctly on the Windows machine.

(Another anomaly in the profiling is that print statements are also taking much longer to execute on the Windows machine than on the Mac.)

Determining if a machine is affected

At this point, we think some Windows environments are affected and others are not.

Here is a diagnostic script that indicates whether a particular machine is affected -- it times the execution of two small Pandana operations, one always single-threaded and the other potentially multithreaded, and reports the ratio. multithreading_diagnostics.py

If you run the script, please let me know what the results are, plus some information about your Pandana environment: operating system, Python version, Pandana version, how Pandana was installed, etc.

We have not seen this issue on Linux or Mac.

Notes

Environments with broken multithreading:

Environments where multithreading works correctly:

lucky-verma commented 3 years ago

Hey, I tried the script but, when I try precompute(100000+1) for my personal project, the system crashes. Any suggestions

JupyterLab - Google Chrome 2020-07-19 02 01 25