This PR extends #134 to provide vectorized calculation of the shortest path length for many OD pairs at once. It achieves this by running a loop on the C++ side, which performs much better than calling a Python function multiple times.
Thanks to @federicofernandez for help with the C++ code.
Comments on the Python API?
I'm leaving the other Python method too, so now we have both:
network.shortest_path_length() (PR #134, takes one OD pair)
network.shortest_path_lengths() (this PR, takes multiple OD pairs)
Is this too confusing? I feel like it might be useful to have both. Note that there's also network.shortest_path(), which has been in the Python API for a while and provides the path itself (sequence of nodes) for a single OD pair.
Performance and accuracy
Thanks to @jessicacamacho for help with testing.
Moving the loop to the C++ side improves performance about 100x-1000x for large numbers of OD pairs, depending on particular details of the use case. (Long routes in large networks see performance improvement at the lower end, because the Python overhead is a smaller portion of the computation time.)
The vectorization also takes advantage of multi-threading, if enabled by the compiler, which speeds things up about 2x on my 4-core machine.
The lengths themselves are very close to those calculated by NetworkX (which uses Dijkstra without contraction hierarchies) using an equivalent OSMNx network.
Other changes
I also adjusted docstrings for some of the other shortest path Python functions, so that they more accurately reflect what the underlying C++ code is doing.
Coverage decreased (-0.8%) to 89.873% when pulling 4b407628b19293217e657dd8eb6ae4547cd88838 on vectorized-shortest-path-length into 4d37b25d761606d4279517e6ffd1316dcd84326b on shortest-path-distance.
This PR extends #134 to provide vectorized calculation of the shortest path length for many OD pairs at once. It achieves this by running a loop on the C++ side, which performs much better than calling a Python function multiple times.
Thanks to @federicofernandez for help with the C++ code.
Comments on the Python API?
I'm leaving the other Python method too, so now we have both:
network.shortest_path_length()
(PR #134, takes one OD pair)network.shortest_path_lengths()
(this PR, takes multiple OD pairs)Is this too confusing? I feel like it might be useful to have both. Note that there's also
network.shortest_path()
, which has been in the Python API for a while and provides the path itself (sequence of nodes) for a single OD pair.Performance and accuracy
Thanks to @jessicacamacho for help with testing.
Moving the loop to the C++ side improves performance about 100x-1000x for large numbers of OD pairs, depending on particular details of the use case. (Long routes in large networks see performance improvement at the lower end, because the Python overhead is a smaller portion of the computation time.)
The vectorization also takes advantage of multi-threading, if enabled by the compiler, which speeds things up about 2x on my 4-core machine.
The lengths themselves are very close to those calculated by NetworkX (which uses Dijkstra without contraction hierarchies) using an equivalent OSMNx network.
Other changes
I also adjusted docstrings for some of the other shortest path Python functions, so that they more accurately reflect what the underlying C++ code is doing.
To do:
I see there's a compilation problem in Windows Python 2.7... https://ci.appveyor.com/project/smmaurer/pandana/builds/33281145/job/u5hh4ekphfyvi0d6