Closed sephib closed 7 years ago
Using an LRU cache is an interesting idea. It that preferable to using an LFU cache in your use case?
Yes I think using nx.ego_graph
could be a smart way to speed up route calculation when an approximate max radius is known. You could pass it a radius and optionally a distance (that is, the weight parameter length
) to induce a small subgraph to then calculate the route on.
Regarding LRU - used it since it is easier to implement (since I"m not limiting the maxsize
then I'm not sure how much of a difference it is). Will check later on regarding LFU .
Will update regarding nx.ego_graph
Hi
I'm still having performance issues with the get_nearest_node
method.
running the code :
origin_node = get_nearst_node_(graph, origin)
takes between 3-30 seconds per point
Trying to run the function on the entire dataset did not increase the performance (Here i'm running on a subset of the dataset).
dfgp.loc[dfgp['group_id'] == id, 'origin_node'] = dfgp.loc[dfgp['group_id'] == id].apply(lambda x: ox.get_nearest_node(graph, point=(x['lat'], x['long'])), axis=1)
not sure if the ego_graph
is relevant since I need a node to run the function.
Is there a way to modify get_nearest_node
function to take a maximum distance? I did not see any such thing with the 'great_circle' function.
Any ideas?
It could be modified to, say, take a point and a radius, then create a circle around the point, then do a spatial intersection with geopandas to retain only those nodes that lie within the circle, and then create a subgraph of only those nodes and their incident edges.
But, you also said:
not sure if the ego_graph is relevant since I need a node to run the function.
You can get the node nearest an arbitrary point with the get_nearest_node function. Then you can use ego_graph
to induce a subgraph of nodes within some network distance from this node.
I've had some time to look further into this. In your example, I see you have 70245 nodes in your graph. I created a similarly sized graph for the city of Los Angeles (n=63922) but I am not seeing the performance issues you described. However, I did find a way to improve the performance time of get_nearest_node
by 5x to 10x.
import osmnx as ox, networkx as nx
ox.config(log_console=True, use_cache=True)
G = ox.graph_from_place('Los Angeles, California, USA', network_type='drive_service')
print(len(G))
Next I define origin and destination lat-long points on opposite sides of LA:
orig_point = (34.234, -118.535)
dest_point = (33.939, -118.242)
And then get the nearest node to each:
%%time
orig_node = ox.get_nearest_node(G, orig_point)
Wall time: 805 ms
%%time
dest_node = ox.get_nearest_node(G, dest_point)
Wall time: 814 ms
So, get_nearest_node
is finding the node nearest to an arbitrary point (in a large graph) in less than one second. Inducing a subgraph is also fast:
%%time
eg = nx.ego_graph(G, orig_node, radius=2000, distance='length')
Wall time: 119 ms
This produces a subgraph with the 383 nodes within 2 km of orig_node
.
But I'm more interested in the performance of the full graph. So, next I calculate the shortest path between these two nodes using the full graph:
%%time
route = nx.shortest_path(G, source=orig_node, target=dest_node, weight='length')
Wall time: 703 ms
So, it was able to calculate the path in less than a second.
route_length_km = sum([G.edge[u][v][0]['length'] for u, v in zip(route, route[1:])]) / 1000.
There are 341 nodes in the path and it has a total length of 51.54 km. Lastly, we can plot it to get a sense of the scale:
fig, ax = ox.plot_graph_route(G, route, fig_height=20, edge_linewidth=0.3, node_size=0)
In these tests, get_nearest_node
performs reasonably fast. However, I also played around with ways to make it faster. The main performance constraint is how it calculates the great-circle distance from the arbitrary point to the coordinates of each node in the graph, then finds the minimum. We can vectorize this calculation. It seems that a sufficiently large graph of hundreds of thousands or millions of nodes would benefit from this vectorization approach.
I will test and commit these updates shortly.
Fast vectorized great circle calculations are added in 75a7dc3eee1d6d1fe6933104e94e7911f6a9959d
Thank you for the review and the great work ! I'll try and check my running environment in addition to your code improvement. (Sorry didn't mean to close this issue without your permission).
Hi, when I use shortest_path
, I got TypeError: unhashable type: 'list'
.
The parameter G
loaded from graphml that I saved recently
save:
ox.save_graphml(roads, 'graph.graphml', folder='/data/sqy/')
load:
graph = ox.load_graphml('graph.graphml', '/data/sqy/')
The version of osmnx is 0.8, networkx is 2.0
@SqySimmon please provide a complete working example of code to reproduce this issue and we'll take a look. If this is a new issue, please open a new issue.
Sorry, I made a mistake in using:
nodes, edges = ox.graph_to_gdfs(graph) graph = ox.gdfs_to_graph(nodes, edges)
I screwed up the orders of nodes
and edges
Thank you for you answer.
Is there any error in this code:
route_length_km = sum([G.edge[u][v][0]['length'] for u, v in zip(route, route[1:])]) / 1000.
Not working for me Thank you
@ovdavid28 if you have general usage questions, please ask on StackOverflow per the contributing guidelines.
I'm trying to calculate routes from thousands of points. Running the
get_nearest_node
andshortest_path
on the following network takes a long time.In order to increase the performance of the function I think the function should select a subset of nodes (based on some parameter), there after run the great_circle() method. I'm thinking of using
nx.ego_graph
- do you think this is a good direction?In the meantime since I have some points that repeat themselves I've used some caching techniques. Do you think it is relevant to add to the core.py?