Open jpountz opened 8 years ago
cc @nknize
I think a significant set of users do need the accuracy of ARC. I'm less certain about usage of PLANE in the current user base, though there are a variety of use cases for rectilinear calculations on even non-geographic data which could be valid. Would love to hear if/where/how people are using PLANE.
I do think the documentation is a bit confusing in a few ways. First, we use some different terminology on our scripting documentation (in terms of function names and a little bit in the description) vs the geo distance query documentation. Second, we don't have any document (that I'm aware of) to identify what the tradeoffs are in terms of speed and accuracy for the 3 in any quantitative manner, so it's difficult for somebody to make an informed decision.
I definitely think people need all 3. I have had users in training talk about using the geo-distance calculation for "earth-line" planes in games (where the virtual earth really is flat). What's the motivation behind removing it?
I think a significant set of users do need the accuracy of ARC.
For what exactly? Its not accurate either, we can assume up to 0.3% error (http://www.movable-type.co.uk/scripts/latlong.html)
Personally I think we should just have ARC
(the explicit SLOPPY_ARC
selection should be deprecated and alias to it), and we decide how to implement that with the best tradeoffs given what we have. Having "within 1 ulp" of an inaccurate math formula does not necessarily translate into better results. Stuff like https://issues.apache.org/jira/browse/LUCENE-5271 is a great example of how it can be practically improved over time. btw, since https://github.com/elastic/elasticsearch/issues/5192 "accurate" arc is now using methods from SloppyMath.java to be more accurate :)
we can imagine still improving these functions going forward, for example maybe it won't make sense to approximate cos
at all after https://bugs.openjdk.java.net/browse/JDK-8143353, maybe asin
gets improvements too, etc etc.
I'm +1 for removing ARC
, PLANE
, and FACTOR
As @rmuir points out the GeoDistance.ARC
is just a copy of Lucene's SloppyMath.haversin
implementation which, when using GeoUtils.earthDiameter
to approximate the ellipsoid, actually gives ~0.65% error, or ~720m error at the equator for 1 degree longitude.
For the most accurate distance calculation (~0.05mm) used in photogrammetry and surveying use-cases, Vincenty or Karney should be used. I committed the vincenty implementation to lucene a while ago and posted some performance and accuracy results here
From those benchmarks you can see why vincenty and karney is not used in GeoPoint queries. Instead I've cut over GeoPoint queries from using SloppyMath.haversin
to an implementation of Sinnott's original haversine publication which gives slightly better accuracy and performance. For now, if someone wants vincenty level of accuracy they can post filter results in their application using GeoDistanceUtils.vincentyDistance
.
I have had users in training talk about using the geo-distance calculation for "earth-line" planes in games (where the virtual earth really is flat). What's the motivation behind removing it?
Currently using geo types for flat game maps? or is this something they've expressed interest in trying?
Use-cases like these strengthen the argument for a new dimensional
field mapper in 3.0 that uses the Lucene 6.0 Dimensional*Field
and flat map cartesian linear algebra.
I have had users in training talk about using the geo-distance calculation for "earth-line" planes in games (where the virtual earth really is flat). What's the motivation behind removing it?
Currently using geo types for flat game maps? or is this something they've expressed interest in trying?
Yeah, currently using geo for game maps with flat worlds.
I think we should just have ARC (the explicit SLOPPY_ARC selection should be deprecated and alias to it), and we decide how to implement that with the best tradeoffs given what we have.
+1
Use-cases like these strengthen the argument for a new dimensional field mapper in 3.0 that uses the Lucene 6.0 Dimensional*Field and flat map cartesian linear algebra.
I'm unsure whether we will want to have a dimensional
field directly or rather expose higher-level fields that leverage Lucene's Dimensional*Field but I'm +1 on restricting geo points to actual earth points and exposing different features to deal with the other uses-cases.
I think we should just have ARC (the explicit SLOPPY_ARC selection should be deprecated and alias to it)
+1 With the new GeoPointField in 2.2, the distance_type
option can be removed. For 2.x I can add a precision
option that trades performance for accuracy at query time and update the documentation to include expected performance.
I'm unsure whether we will want to have a dimensional field directly or rather expose higher-level fields that leverage Lucene's Dimensional*Field
+1.
@nknize can this issue be closed yet?
I think these distance types should remain.
plane
for filtering (since it's the fastest but less accurate), and sloppy_arc
for sorting. Or use both of them for filtering, like first using plane
filter to get documents in several kilometers radius, then using more accurate sloppy_arc
for better filtering.haversin
function(which is lucene sloppy_arc), but there's no way to get latitude and longitude parameters from your location field. You can only get Single valued document fields
. If you use Groovy lang, then you can get latitude and longitude, but you can't use haversin
. You can use plane
or arc
distances only. I may be wrong, but this is what I understood from the docs.Btw, great job guys!
This issue is addressed in a stalled PR (#23009). I think it stalled because LatLonPoint
had not landed yet and the PR was never revisited. Now that PLANE
and ARC
don't mean anything I'll bring the PR current to remove these distance types.
Updates to #23009 are here https://github.com/elastic/elasticsearch/commit/952c62893e4bcc61e11cd954d4edfd420807a7ca pending review
@nknize @djptek it's been more than a year since we touched it, should we revisit it and revive it somehow?
hi @imotov if this is still useful I can merge #23009 to get this moving again just let me know
@nknize is this still useful?
@djptek This is still useful. Would you mind updating #23009?
@nknize sure thing cc @imotov
Pinging @elastic/es-analytical-engine (Team:Analytics)
Elasticsearch has 3 ways to compute geo distances:
But this is sometimes confusing to our users, see eg #15606. Do we really need to have 3 options or could we just use SLOPPY_ARC all the time? It seems to be like a good trade-off between accuracy and speed?