Open asfimport opened 7 years ago
Michael McCandless (@mikemccand) (migrated from JIRA)
This sounds promising!
But, we are moving away from GeoPointDistanceQuery
in favor of the block KD tree (dimensional points) implemenation, LatLonPoint.newDistanceQuery
: the latter is quite a bit faster as measured in our nightly geo benchmarks, and it recently just got even faster with #8707.
Do you think the ideas from these papers would also apply to LatLonPoint.newDistanceQuery
?
Michael McCandless (@mikemccand) (migrated from JIRA)
This approach can also be used to priorize rectangles, for top-k search.
Oh we also have a nearest neighbor implementation for points: LatLonPoint.nearest
. Seems like these papers could help that too?
Erich Schubert (migrated from JIRA)
Should apply to LatLonPointDistanceQuery
as well, to reduce the number of false positives in #compare
.
In NearestNeighbor
, it should improve #approxBestDistance
(which currently computes haversine to all four corners of the bounding box, which is incorrect for a large bounding box, and a point close to the middle point of an edge & about 4x as expensive). In fact, it would be trueBestDistance
as desired in the TODOs.
GeoPoint queries currently use only the bounding box for filtering. But the query circle is only roughly 80% of the bounding box, so we could be roughly 20% faster. Furthermore, the current approach requires splitting the box if it crosses the date line.
> Schubert, E., Zimek, A., & Kriegel, H. P. (2013, August). Geodetic distance queries on r-trees for indexing geographic data. In International Symposium on Spatial and Temporal Databases (pp. 146-164).
The minimum spherical distance of a point to a rectangle is given ("Algorithm 2: Optimized Minimum Distance Point to MBR"). Rectangles whose minimum distance is larger than the query radius can be skipped. The authors used the R-tree, but it will work with any bounding box, so it can be used in CellComparator#relate. It's not very complex - a few case distinctions, and then either Haversine distance, or cross-track-distance. So the cost ist comparable to Haversine. This could be added as GeoRelationUtils.pointToRectMinimumDistance, for example.
This approach can also be used to priorize rectangles, for top-k search.
Migrated from LUCENE-7663 by Erich Schubert, updated Jan 28 2017