cosbidev / PyTrack

a Map-Matching-based Python Toolbox for Vehicle Trajectory Reconstruction
https://pytrack-lib.readthedocs.io/en/latest/#
BSD 3-Clause Clear License
63 stars 10 forks source link

Performance problem with pytrack, not sure what is the root cause #11

Open RaniRaven opened 1 year ago

RaniRaven commented 1 year ago

Using the GPS trajectory as points=[(42.343981, -83.068282), (42.344557, -83.06855), (42.345158, -83.068826), (42.345739, -83.069091), (42.346302, -83.069342), (42.346882, -83.06961), (42.347475, -83.069879), (42.348061, -83.070143), (42.348634, -83.070406), (42.349217, -83.070679), (42.349808, -83.070997), (42.35039, -83.071336), (42.350993, -83.071686), (42.351586, -83.072038), (42.352192, -83.072393), (42.352813, -83.072754), (42.35344, -83.073121), (42.354081, -83.073484), (42.354718, -83.073834), (42.355347, -83.07416), (42.355961, -83.074406), (42.356566, -83.074608), (42.357158, -83.074775), (42.357734, -83.074922), (42.358335, -83.074962), (42.358951, -83.07485), (42.359548, -83.074571), (42.360128, -83.074126), (42.360585, -83.07362), (42.360991, -83.073111), (42.361731, -83.072093), (42.362051, -83.071603), (42.362373, -83.07109), (42.362699, -83.070568), (42.363038, -83.069994), (42.363382, -83.069346), (42.363716, -83.068678), (42.364031, -83.068009), (42.364324, -83.067383), (42.364612, -83.066773), (42.364906, -83.066133), (42.365196, -83.065498), (42.365482, -83.064841), (42.365752, -83.064162), (42.365985, -83.063562), (42.366205, -83.062984), (42.36638, -83.062431), (42.366498, -83.06189), (42.366559, -83.061304), (42.36657, -83.060721), (42.366535, -83.060186), (42.366487, -83.059704), (42.366442, -83.059258), (42.366413, -83.05882), (42.366457, -83.058362), (42.366611, -83.057939), (42.366845, -83.057612), (42.367143, -83.057395), (42.367517, -83.057308), (42.367924, -83.057414), (42.368374, -83.057617), (42.368884, -83.057858), (42.369424, -83.058138), (42.369975, -83.058455), (42.370521, -83.058753), (42.371084, -83.058982), (42.37172, -83.059162), (42.372423, -83.059326), (42.373146, -83.059485), (42.373877, -83.059694), (42.37461, -83.06005), (42.375353, -83.06052), (42.376113, -83.061019), (42.376876, -83.061525), (42.37761, -83.062009), (42.378308, -83.062475), (42.378983, -83.062909), (42.379634, -83.063329), (42.380269, -83.063755), (42.380912, -83.064185), (42.381564, -83.064626), (42.382227, -83.065073), (42.382888, -83.065521), (42.383551, -83.065964), (42.384226, -83.066415), (42.384914, -83.066879), (42.385623, -83.067359), (42.386344, -83.067846), (42.387071, -83.068332), (42.387795, -83.068802), (42.388511, -83.069236)]

But this time taking a map as (since I wish to check for several trajectories in this area) : north = 42.534092 south = 42.118413 west = -83.343126 east = -82.727472 downloaded_g = graph.download.osm_download((north,south,west,east), network_type='drive') G = graph.create_graph(downloaded_g) start = datetime.datetime.now() G_interp, candidates = candidate.get_candidates(G, points, interp_dist=5, closest=True, radius=10) p1 = datetime.datetime.now() trellis = mpmatching_utils.create_trellis(candidates) path_prob, predecessor = mpmatching.viterbi_search(G_interp, trellis, "start", "target") p2 = datetime.datetime.now() l_ids, l_coords = mpmatching_utils.create_matched_path(G_interp, trellis, predecessor) print('Got results : ',len(points),',',len(l_ids),',',len(l_coords),' times : ',int((p1-start).total_seconds()1000),';',int((p2-p1).total_seconds()1000))

The get_candidates never returns till some memory error inside this function, but it takes more than an hour to reach there.

Haven't analyzed the code so far, but it seems like there is no "smart" search for candidates, but passing over the whole graph edge by edge. If you can please tell me what is wrong there.

RaniRaven commented 1 year ago

I think I understand now the issue. NetworkX graph has no index for geospatial, and therefore, for each candidate the code would need to pass over all the edges in order to calculate the distance. This is highly inefficient, of course, as only is 50x50 km there are ~1 million edges, if you count both the residential and unmarked roads. This isn't a good strategy, the edges should be accessed in a more efficient way. Passing over 90 candidates for finding the closest edge will result in hours. In addition the NetworkX implementation consumes a lot of memory, and therefore will cause memory failure.