elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.64k stars 24.64k forks source link

Remove the PLANE and ARC distance types #15616

Open jpountz opened 8 years ago

jpountz commented 8 years ago

Elasticsearch has 3 ways to compute geo distances:

But this is sometimes confusing to our users, see eg #15606. Do we really need to have 3 options or could we just use SLOPPY_ARC all the time? It seems to be like a good trade-off between accuracy and speed?

jpountz commented 8 years ago

cc @nknize

eskibars commented 8 years ago

I think a significant set of users do need the accuracy of ARC. I'm less certain about usage of PLANE in the current user base, though there are a variety of use cases for rectilinear calculations on even non-geographic data which could be valid. Would love to hear if/where/how people are using PLANE.

I do think the documentation is a bit confusing in a few ways. First, we use some different terminology on our scripting documentation (in terms of function names and a little bit in the description) vs the geo distance query documentation. Second, we don't have any document (that I'm aware of) to identify what the tradeoffs are in terms of speed and accuracy for the 3 in any quantitative manner, so it's difficult for somebody to make an informed decision.

dakrone commented 8 years ago

I definitely think people need all 3. I have had users in training talk about using the geo-distance calculation for "earth-line" planes in games (where the virtual earth really is flat). What's the motivation behind removing it?

rmuir commented 8 years ago

I think a significant set of users do need the accuracy of ARC.

For what exactly? Its not accurate either, we can assume up to 0.3% error (http://www.movable-type.co.uk/scripts/latlong.html)

Personally I think we should just have ARC (the explicit SLOPPY_ARC selection should be deprecated and alias to it), and we decide how to implement that with the best tradeoffs given what we have. Having "within 1 ulp" of an inaccurate math formula does not necessarily translate into better results. Stuff like https://issues.apache.org/jira/browse/LUCENE-5271 is a great example of how it can be practically improved over time. btw, since https://github.com/elastic/elasticsearch/issues/5192 "accurate" arc is now using methods from SloppyMath.java to be more accurate :)

we can imagine still improving these functions going forward, for example maybe it won't make sense to approximate cos at all after https://bugs.openjdk.java.net/browse/JDK-8143353, maybe asin gets improvements too, etc etc.

nknize commented 8 years ago

I'm +1 for removing ARC, PLANE, and FACTOR

As @rmuir points out the GeoDistance.ARC is just a copy of Lucene's SloppyMath.haversin implementation which, when using GeoUtils.earthDiameter to approximate the ellipsoid, actually gives ~0.65% error, or ~720m error at the equator for 1 degree longitude.

For the most accurate distance calculation (~0.05mm) used in photogrammetry and surveying use-cases, Vincenty or Karney should be used. I committed the vincenty implementation to lucene a while ago and posted some performance and accuracy results here

From those benchmarks you can see why vincenty and karney is not used in GeoPoint queries. Instead I've cut over GeoPoint queries from using SloppyMath.haversin to an implementation of Sinnott's original haversine publication which gives slightly better accuracy and performance. For now, if someone wants vincenty level of accuracy they can post filter results in their application using GeoDistanceUtils.vincentyDistance.

nknize commented 8 years ago

I have had users in training talk about using the geo-distance calculation for "earth-line" planes in games (where the virtual earth really is flat). What's the motivation behind removing it?

Currently using geo types for flat game maps? or is this something they've expressed interest in trying?

Use-cases like these strengthen the argument for a new dimensional field mapper in 3.0 that uses the Lucene 6.0 Dimensional*Field and flat map cartesian linear algebra.

dakrone commented 8 years ago

I have had users in training talk about using the geo-distance calculation for "earth-line" planes in games (where the virtual earth really is flat). What's the motivation behind removing it?

Currently using geo types for flat game maps? or is this something they've expressed interest in trying?

Yeah, currently using geo for game maps with flat worlds.

jpountz commented 8 years ago

I think we should just have ARC (the explicit SLOPPY_ARC selection should be deprecated and alias to it), and we decide how to implement that with the best tradeoffs given what we have.

+1

Use-cases like these strengthen the argument for a new dimensional field mapper in 3.0 that uses the Lucene 6.0 Dimensional*Field and flat map cartesian linear algebra.

I'm unsure whether we will want to have a dimensional field directly or rather expose higher-level fields that leverage Lucene's Dimensional*Field but I'm +1 on restricting geo points to actual earth points and exposing different features to deal with the other uses-cases.

nknize commented 8 years ago

I think we should just have ARC (the explicit SLOPPY_ARC selection should be deprecated and alias to it)

+1 With the new GeoPointField in 2.2, the distance_type option can be removed. For 2.x I can add a precision option that trades performance for accuracy at query time and update the documentation to include expected performance.

I'm unsure whether we will want to have a dimensional field directly or rather expose higher-level fields that leverage Lucene's Dimensional*Field

+1.

clintongormley commented 8 years ago

@nknize can this issue be closed yet?

esen commented 7 years ago

I think these distance types should remain.

Btw, great job guys!

nknize commented 5 years ago

This issue is addressed in a stalled PR (#23009). I think it stalled because LatLonPoint had not landed yet and the PR was never revisited. Now that PLANE and ARC don't mean anything I'll bring the PR current to remove these distance types.

djptek commented 5 years ago

Updates to #23009 are here https://github.com/elastic/elasticsearch/commit/952c62893e4bcc61e11cd954d4edfd420807a7ca pending review

imotov commented 4 years ago

@nknize @djptek it's been more than a year since we touched it, should we revisit it and revive it somehow?

djptek commented 4 years ago

hi @imotov if this is still useful I can merge #23009 to get this moving again just let me know

imotov commented 4 years ago

@nknize is this still useful?

nknize commented 4 years ago

@djptek This is still useful. Would you mind updating #23009?

djptek commented 4 years ago

@nknize sure thing cc @imotov

elasticsearchmachine commented 2 months ago

Pinging @elastic/es-analytical-engine (Team:Analytics)