Currently, our strategy for calculating LineString-LineString, Polygon-LineString, and Polygon-Polygon distance is as follows:
LineString-LineString:
build two R*-trees from the Line components of each geometry
Calculate the minimum distance from each Point in geometry A to each Line in geometry B using the R*-tree lookup, and vice-versa
Return the smaller of the two distances
LineString-Polygon
See above
Polygon-Polygon:
See above, unless both Polygons are convex:
Use the fancy rotating-calipers method, which is both fast and memory-efficient
The issue is that loading Lines into the trees (I suspect massively) dominates the running time up to a certain threshold, at which point it becomes reasonable again, because the algorithm as described above is quadratic without the use of the tree.
I don't have empirical data for this, but my strong suspicion is that use-cases are ~bimodally distributed:
geometries derived from physical geography (lots and lots of vertices)
everything else (a few hundred vertices at most)
For (2), we should just brute-force the search. As an added bonus, we could even try parallelising it using two threads (although again, spinning up a thread may be much slower than just two searches)
We should write some benchmarks, and see if we can determine the total number of vertices above which we use the tree.
Currently, our strategy for calculating
LineString
-LineString
,Polygon
-LineString
, andPolygon
-Polygon
distance is as follows:LineString
-LineString
:Line
components of each geometryPoint
in geometry A to eachLine
in geometry B using the R*-tree lookup, and vice-versaLineString
-Polygon
Polygon
-Polygon
:The issue is that loading
Line
s into the trees (I suspect massively) dominates the running time up to a certain threshold, at which point it becomes reasonable again, because the algorithm as described above is quadratic without the use of the tree.I don't have empirical data for this, but my strong suspicion is that use-cases are ~bimodally distributed:
For (2), we should just brute-force the search. As an added bonus, we could even try parallelising it using two threads (although again, spinning up a thread may be much slower than just two searches)
We should write some benchmarks, and see if we can determine the total number of vertices above which we use the tree.