Determining a cut-off for brute-force distance calculations

Currently, our strategy for calculating LineString-LineString, Polygon-LineString, and Polygon-Polygon distance is as follows:

LineString-LineString:

build two R*-trees from the Line components of each geometry
Calculate the minimum distance from each Point in geometry A to each Line in geometry B using the R*-tree lookup, and vice-versa
Return the smaller of the two distances

LineString-Polygon

See above

Polygon-Polygon:

See above, unless both Polygons are convex:
Use the fancy rotating-calipers method, which is both fast and memory-efficient

The issue is that loading Lines into the trees (I suspect massively) dominates the running time up to a certain threshold, at which point it becomes reasonable again, because the algorithm as described above is quadratic without the use of the tree.

I don't have empirical data for this, but my strong suspicion is that use-cases are ~bimodally distributed:

geometries derived from physical geography (lots and lots of vertices)
everything else (a few hundred vertices at most)

For (2), we should just brute-force the search. As an added bonus, we could even try parallelising it using two threads (although again, spinning up a thread may be much slower than just two searches)

We should write some benchmarks, and see if we can determine the total number of vertices above which we use the tree.

georust / geo

Determining a cut-off for brute-force distance calculations #225