CityofToronto / bdit_traffic_prophet

Suite of algorithms for predicting average daily traffic on Toronto streets
GNU General Public License v3.0
1 stars 1 forks source link

KDTree-based nearest neighbour search for permanent counts #13

Closed cczhu closed 4 years ago

cczhu commented 4 years ago

PRTCS imposes a nearest-neighbour cutoff prior to doing minimum mean square error matching. Nearest neighbours are determined using a Delauney Triangulation method, but the end result is the same as for any exact nearest neighbour search, so we can just use a KDTree-based method.

cczhu commented 4 years ago

Both PRTCS and KCOUNT require distance information - PRTCS uses angular distance (or whatever you would call distances calculated without converting lat-lon to something sensible) to determine N nearest neighbours, while KCOUNT directly uses the distance.

There are ~150 PTC locations and ~9000 STTC locations across all directions and all years with data. There are ~16,000 major or minor arterial or collector road centreline segments. Storing the full pairwise distance matrix for all arterial centreline segments would require a table with ~260 million rows or a 2D matrix of 960 MB in NumPy, both of which are feasible to store but infeasible to parse (. TEPs-I solves this problem by only considering neighbouring links within 2 km of any given centreline segment (line 110 of data_prep_kriging.m, b=b0((find(b0(:,15 )<2)),:);) in KCOUNT, and using a nearest neighbour search in place of a distance matrix in PRTCS.

We should adopt a similar strategy for CountMatch - calculate N nearest neighbours either using a tree for Euclidean or Manhattan distances or routing for network distances. The former can be done quickly locally in Python using a KDTree. For the latter, only N nearest neighbours (and potentially their distances) should be stored, but this raises the problem that the database would have to calculate neighbours every time short term counts from a new location are introduced. Possible solutions: