get_nearest_node and shortest_path performance issue

sephib commented 7 years ago

I'm trying to calculate routes from thousands of points. Running the get_nearest_node and shortest_path on the following network takes a long time.

{
    'circuity_avg' : 1.084601551622338,
    'count_intersections' : 58456,
    'edge_density_km' : None,
    'edge_length_avg' : 119.35486740722354,
    'edge_length_total' : 19018004.572667,
    'intersection_density_km' : None,
    'k_avg' : 4.536693003060716,
    'm' : 159340,
    'n' : 70245,
    'node_density_km' : None,
    'self_loop_proportion' : 0.004022844232458893,
    'street_density_km' : None,
    'street_length_avg' : 123.93589363482089,
    'street_length_total' : 12220326.984180609,
    'street_segments_count' : 98602,
    'streets_per_node_avg' : 2.80183642963912,
    'streets_per_node_counts' : {
        0 : 0,
        1 : 11789,
        2 : 499,
        3 : 48094,
        4 : 9584,
        5 : 265,
        6 : 13,
        7 : 1
    },
    'streets_per_node_proportion' : {
        0 : 0.0,
        1 : 0.16782689159370773,
        2 : 0.007103708448999929,
        3 : 0.6846608299523098,
        4 : 0.13643675706455977,
        5 : 0.003772510498967898,
        6 : 0.00018506655277955727,
        7 : 1.4235888675350559e-05
    }
}

In order to increase the performance of the function I think the function should select a subset of nodes (based on some parameter), there after run the great_circle() method. I'm thinking of using nx.ego_graph - do you think this is a good direction?

In the meantime since I have some points that repeat themselves I've used some caching techniques. Do you think it is relevant to add to the core.py?

from functools import lru_cache

@lru_cache(maxsize=None)
def get_nearst_node_(graph, coordinate):
    return ox.get_nearest_node(graph, coordinate)

@lru_cache(maxsize=None)
def shortest_path_(graph, origin_node, destination_node):
    return nx.shortest_path(graph, origin_node, destination_node)

gboeing commented 7 years ago

Using an LRU cache is an interesting idea. It that preferable to using an LFU cache in your use case?

Yes I think using nx.ego_graph could be a smart way to speed up route calculation when an approximate max radius is known. You could pass it a radius and optionally a distance (that is, the weight parameter length) to induce a small subgraph to then calculate the route on.

sephib commented 7 years ago

Regarding LRU - used it since it is easier to implement (since I"m not limiting the maxsize then I'm not sure how much of a difference it is). Will check later on regarding LFU .
Will update regarding nx.ego_graph

sephib commented 7 years ago

Hi I'm still having performance issues with the get_nearest_node method. running the code :

origin_node = get_nearst_node_(graph, origin)

takes between 3-30 seconds per point

Trying to run the function on the entire dataset did not increase the performance (Here i'm running on a subset of the dataset).

dfgp.loc[dfgp['group_id'] == id, 'origin_node'] = dfgp.loc[dfgp['group_id'] == id].apply(lambda x: ox.get_nearest_node(graph, point=(x['lat'], x['long'])), axis=1)

not sure if the ego_graph is relevant since I need a node to run the function. Is there a way to modify get_nearest_node function to take a maximum distance? I did not see any such thing with the 'great_circle' function.

Any ideas?

gboeing commented 7 years ago

It could be modified to, say, take a point and a radius, then create a circle around the point, then do a spatial intersection with geopandas to retain only those nodes that lie within the circle, and then create a subgraph of only those nodes and their incident edges.

But, you also said:

not sure if the ego_graph is relevant since I need a node to run the function.

You can get the node nearest an arbitrary point with the get_nearest_node function. Then you can use ego_graph to induce a subgraph of nodes within some network distance from this node.

gboeing commented 7 years ago

I've had some time to look further into this. In your example, I see you have 70245 nodes in your graph. I created a similarly sized graph for the city of Los Angeles (n=63922) but I am not seeing the performance issues you described. However, I did find a way to improve the performance time of get_nearest_node by 5x to 10x.

Testing the performance

import osmnx as ox, networkx as nx
ox.config(log_console=True, use_cache=True)
G = ox.graph_from_place('Los Angeles, California, USA', network_type='drive_service')
print(len(G))

Next I define origin and destination lat-long points on opposite sides of LA:

orig_point = (34.234, -118.535)
dest_point = (33.939, -118.242)

And then get the nearest node to each:

%%time
orig_node = ox.get_nearest_node(G, orig_point)

Wall time: 805 ms

%%time
dest_node = ox.get_nearest_node(G, dest_point)

Wall time: 814 ms

So, get_nearest_node is finding the node nearest to an arbitrary point (in a large graph) in less than one second. Inducing a subgraph is also fast:

%%time
eg = nx.ego_graph(G, orig_node, radius=2000, distance='length')

Wall time: 119 ms

This produces a subgraph with the 383 nodes within 2 km of orig_node.

But I'm more interested in the performance of the full graph. So, next I calculate the shortest path between these two nodes using the full graph:

%%time
route = nx.shortest_path(G, source=orig_node, target=dest_node, weight='length')

Wall time: 703 ms

So, it was able to calculate the path in less than a second.

route_length_km = sum([G.edge[u][v][0]['length'] for u, v in zip(route, route[1:])]) / 1000.

There are 341 nodes in the path and it has a total length of 51.54 km. Lastly, we can plot it to get a sense of the scale:

fig, ax = ox.plot_graph_route(G, route, fig_height=20, edge_linewidth=0.3, node_size=0)

Can we make this faster with vectorization?

In these tests, get_nearest_node performs reasonably fast. However, I also played around with ways to make it faster. The main performance constraint is how it calculates the great-circle distance from the arbitrary point to the coordinates of each node in the graph, then finds the minimum. We can vectorize this calculation. It seems that a sufficiently large graph of hundreds of thousands or millions of nodes would benefit from this vectorization approach.

I will test and commit these updates shortly.

gboeing commented 7 years ago

Fast vectorized great circle calculations are added in 75a7dc3eee1d6d1fe6933104e94e7911f6a9959d

sephib commented 7 years ago

Thank you for the review and the great work ! I'll try and check my running environment in addition to your code improvement. (Sorry didn't mean to close this issue without your permission).

simmonssong commented 6 years ago

Hi, when I use shortest_path, I got TypeError: unhashable type: 'list'. The parameter G loaded from graphml that I saved recently save: ox.save_graphml(roads, 'graph.graphml', folder='/data/sqy/') load: graph = ox.load_graphml('graph.graphml', '/data/sqy/')

The version of osmnx is 0.8, networkx is 2.0

gboeing commented 6 years ago

@SqySimmon please provide a complete working example of code to reproduce this issue and we'll take a look. If this is a new issue, please open a new issue.

simmonssong commented 6 years ago

Sorry, I made a mistake in using: nodes, edges = ox.graph_to_gdfs(graph) graph = ox.gdfs_to_graph(nodes, edges) I screwed up the orders of nodes and edges Thank you for you answer.

ovdavid28 commented 4 years ago

Is there any error in this code:

route_length_km = sum([G.edge[u][v][0]['length'] for u, v in zip(route, route[1:])]) / 1000.

Not working for me Thank you

gboeing commented 4 years ago

@ovdavid28 if you have general usage questions, please ask on StackOverflow per the contributing guidelines.

gboeing / osmnx

get_nearest_node and shortest_path performance issue #30

Testing the performance

Can we make this faster with vectorization?