Closed asfimport closed 8 years ago
Nick Knize (@nknize) (migrated from JIRA)
Patch includes:
update testGeoRelations
to rewrite longitudinal span queries into 4 slices to reduce radial distortion error in GeoUtils
cartesian approximation methods - this would be needed for BKD queries due to the nature of the split algorithm for large distance queries.
Incorporate distance error on isClosestPointOnRectWIthinRange
for better approximation within BKD.
added comments and javadocs.
Nick Knize (@nknize) (migrated from JIRA)
Updated patch includes the following:
GeoUtils
into separate GeoRelationUtils
classapprox
flag to bypass advanced relation logic (useful for GeoPointField
since approximations are already handled at the grid levelAwaitsFix
from TestGeoUtils.testGeoRelations
Next step is to test w/ BKD integration.
Michael McCandless (@mikemccand) (migrated from JIRA)
Hi @nknize, I think you forgot to add GeoRelationUtils.java
before making the patch? And also, can you make the patch from higher up in the source tree? It seems to be missing e.g. changes to SloppyMath.java
.
Nick Knize (@nknize) (migrated from JIRA)
Thanks @mikemccand! Fixed the patch to include GeoRelationUtils and changes to SloppyMath.
Michael McCandless (@mikemccand) (migrated from JIRA)
Thanks @nknize! Lots of geo math that I don't understand :)
It hit a failure after some beasting:
[junit4] Started J0 PID(28075@localhost).
[junit4] Suite: org.apache.lucene.util.TestGeoUtils
[junit4] 1> doc=2304 matched but should not
[junit4] 1> lon=-52.68950667232275 lat=-89.24937685020268 distanceMeters=267809.96722662856 vs radiusMeters=263432.23741533695
[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeoUtils -Dtests.method=testGeoRelations -Dtests.seed=3B46325723FB86A9 -Dtests.multiplier=5 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=vi_VN -Dtests.timezone=America/Halifax -Dtests.asserts=true -Dtests.file.encoding=UTF-8
[junit4] FAILURE 2.61s | TestGeoUtils.testGeoRelations <<<
[junit4] > Throwable #1: java.lang.AssertionError: 1 incorrect hits (see above)
[junit4] > at __randomizedtesting.SeedInfo.seed([3B46325723FB86A9:F96526E2570CF017]:0)
[junit4] > at org.apache.lucene.util.TestGeoUtils.testGeoRelations(TestGeoUtils.java:533)
[junit4] > at java.lang.Thread.run(Thread.java:745)
[junit4] 2> NOTE: test params are: codec=HighCompressionCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=HIGH_COMPRESSION, chunkSize=9, maxDocsPerChunk=323, blockSize=5), termVectorsFormat=CompressingTermVectorsFormat(compressionMode=HIGH_COMPRESSION, chunkSize=9, blockSize=5)), sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=vi_VN, timezone=America/Halifax
[junit4] 2> NOTE: Linux 3.13.0-61-generic amd64/Oracle Corporation 1.8.0_60 (64-bit)/cpus=8,threads=1,free=396588944,total=504889344
[junit4] 2> NOTE: All tests run in this JVM: [TestGeoUtils]
[junit4] Completed [1/1] in 2.77s, 1 test, 1 failure <<< FAILURES!
Also, this comment seems stale now? (We seem to be using this method?):
// NOTE: not used for 2d at the moment. used for 3d w/ altitude (we can keep or add back)
Nick Knize (@nknize) (migrated from JIRA)
Thanks @mikemccand. In isClosestPointOnRectWithinRange
it was still using SloppyMath.haversin
where it should have been using the Sinnott haversine. Looks like it was caused by a premature diff from my benchmark workspace.
A quick interesting benchmark re: distance methods. Using 2M iterations I benchmarked average computation time, average spatial error, and maximum spatial error between 5 different geospatial distance methods:
SloppyMath.haversin
but uses ECEF to accurately compute semi-major axis at the given latitude.Distance Method | Avg Computation (ns) | Avg Error (%) | Max Error (%) |
---|---|---|---|
Vincenty | 1286.5529689998828 | 8.4561794778802E-11 | 7.2565793775802E-10 |
Karney | 31770.479206999946 | 8.4561794778802E-11 | 7.2565793775802E-10 |
HaverECF | 717.1139850000152 | 0.18409042301798453 | 0.6681179192384695 |
SloppyMath.haverin | 159.3095249999995 | 0.22594450803222424 | 0.6539314586426048 |
Sinnott Haversine | 146.81236699999738 | 0.18158435835918563 | 0.4931242857748373 |
I need to run some better descriptive statistics before drawing any conclusions. At the moment, the discrepancy between Sloppymath.haversin
and Sinnott's implementation is consistent with the error described in the original publication (which is where the tolerance of 0.5% originated).
More to come....
Michael McCandless (@mikemccand) (migrated from JIRA)
Those benchmark results are nice @nknize! Is the source for this bench checked in somewhere? From this, it seems like we should switch to Sinnot Haversine? It's fastest and lowest error?
I beasted for a while and no failures!
+1 to commit! Hopefully this means we can add DimensionalDistanceQuery
and it just works!
Nick Knize (@nknize) (migrated from JIRA)
Thanks @mikemccand! I plan to add the benchmark to luceneutil unless you think there's a better place for it? Going to commit shortly.
ASF subversion and git services (migrated from JIRA)
Commit 1718481 from @nknize in branch 'dev/trunk' https://svn.apache.org/r1718481
LUCENE-6908: Fix TestGeoUtils.testGeoRelations to handle irregular rectangles. Refactors relational methods to new GeoRelationUtils class.
ASF subversion and git services (migrated from JIRA)
Commit 1718486 from @nknize in branch 'dev/branches/branch_5x' https://svn.apache.org/r1718486
LUCENE-6908: Fix TestGeoUtils.testGeoRelations to handle irregular rectangles. Refactors relational methods to new GeoRelationUtils class.
Steven Rowe (@sarowe) (migrated from JIRA)
My Jenkins found a reproducible failure in TestGeoUtils.testGeoRelations:
[junit4] Suite: org.apache.lucene.util.TestGeoUtils
[junit4] 1> doc=1921 matched but should not on iteration 229
[junit4] 1> lon=87.14082470163703 lat=-89.39206877723336 distanceMeters=205662.45440744862 vs radiusMeters=203580.37384777897
[junit4] 1> doc=2077 matched but should not on iteration 229
[junit4] 1> lon=63.26208980754018 lat=-89.36728684231639 distanceMeters=204170.67218267516 vs radiusMeters=203580.37384777897
[junit4] 2> NOTE: download the large Jenkins line-docs file by running 'ant get-jenkins-line-docs' in the lucene directory.
[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeoUtils -Dtests.method=testGeoRelations -Dtests.seed=4513B1942DE0E2D3 -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/home/jenkins/lucene-data/enwiki.random.lines.txt -Dtests.locale=lv -Dtests.timezone=America/St_Vincent -Dtests.asserts=true -Dtests.file.encoding=UTF-8
[junit4] FAILURE 2.24s | TestGeoUtils.testGeoRelations <<<
[junit4] > Throwable #1: java.lang.AssertionError: 2 incorrect hits (see above)
Steven Rowe (@sarowe) (migrated from JIRA)
See also Policeman Jenkins failures at http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/14886/ and http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/15198/
Steven Rowe (@sarowe) (migrated from JIRA)
Another reproducible failure from my Jenkins (on branch_5x):
[junit4] Suite: org.apache.lucene.util.TestGeoUtils
[junit4] 1> doc=1357 matched but should not on iteration 2
[junit4] 1> lon=-24.478504993021488 lat=88.08790524490178 distanceMeters=303542.5030168316 vs radiusMeters=295327.75445295114
[junit4] 2> NOTE: download the large Jenkins line-docs file by running 'ant get-jenkins-line-docs' in the lucene directory.
[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeoUtils -Dtests.method=testGeoRelations -Dtests.seed=BAD657FC92E56596 -Dtests.slow=true -Dtests.linedocsfile=/home/jenkins/lucene-data/enwiki.random.lines.txt -Dtests.locale=lv -Dtests.timezone=America/Knox_IN -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
[junit4] FAILURE 0.33s J0 | TestGeoUtils.testGeoRelations <<<
[junit4] > Throwable #1: java.lang.AssertionError: 1 incorrect hits (see above)
[junit4] > at __randomizedtesting.SeedInfo.seed([BAD657FC92E56596:78F54349E6121328]:0)
[junit4] > at org.apache.lucene.util.TestGeoUtils.testGeoRelations(TestGeoUtils.java:533)
[junit4] > at java.lang.Thread.run(Thread.java:745)
[junit4] 2> NOTE: test params are: codec=Lucene54, sim=RandomSimilarityProvider(queryNorm=false,coord=crazy): {}, locale=lv, timezone=America/Knox_IN
[junit4] 2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 1.8.0_45 (64-bit)/cpus=16,threads=1,free=434060912,total=514850816
[junit4] 2> NOTE: All tests run in this JVM: [TestGeoUtils]
[junit4] Completed [1/20 (1!)] on J0 in 1.14s, 8 tests, 1 failure <<< FAILURES!
Steven Rowe (@sarowe) (migrated from JIRA)
One more reproducible failure from my Jenkins on branch_5x (I'll stop adding more after this one):
[junit4] Suite: org.apache.lucene.util.TestGeoUtils
[junit4] 1> doc=1431 matched but should not on iteration 50
[junit4] 1> lon=14.089814610779285 lat=88.21761829778552 distanceMeters=310627.1321615869 vs radiusMeters=308762.06620344025
[junit4] 2> NOTE: download the large Jenkins line-docs file by running 'ant get-jenkins-line-docs' in the lucene directory.
[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeoUtils -Dtests.method=testGeoRelations -Dtests.seed=70AF7A8C5D104698 -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/home/jenkins/lucene-data/enwiki.random.lines.txt -Dtests.locale=es_CR -Dtests.timezone=Africa/Maputo -Dtests.asserts=true -Dtests.file.encoding=UTF-8
[junit4] FAILURE 1.28s J3 | TestGeoUtils.testGeoRelations <<<
[junit4] > Throwable #1: java.lang.AssertionError: 1 incorrect hits (see above)
[junit4] > at __randomizedtesting.SeedInfo.seed([70AF7A8C5D104698:B28C6E3929E73026]:0)
[junit4] > at org.apache.lucene.util.TestGeoUtils.testGeoRelations(TestGeoUtils.java:533)
[junit4] > at java.lang.Thread.run(Thread.java:745)
[junit4] 2> NOTE: test params are: codec=Asserting(Lucene54): {}, docValues:{}, sim=DefaultSimilarity, locale=es_CR, timezone=Africa/Maputo
[junit4] 2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 1.8.0_45 (64-bit)/cpus=16,threads=1,free=426047992,total=514850816
[junit4] 2> NOTE: All tests run in this JVM: [TestGeoUtils]
[junit4] Completed [3/20 (1!)] on J3 in 1.74s, 8 tests, 1 failure <<< FAILURES!
Nick Knize (@nknize) (migrated from JIRA)
++ Thanks @sarowe There's a geometric approximation in the within that's a bit too lenient for BKD. These seeds have been super helpful in regression testing a fix. Should have a patch shortly. Sorry for the noise!
Nick Knize (@nknize) (migrated from JIRA)
Updated patch includes:
ASF subversion and git services (migrated from JIRA)
Commit 1722448 from @nknize in branch 'dev/trunk' https://svn.apache.org/r1722448
LUCENE-6908: Add space segmentation for handling irregular rectangle accuracy at the poles.
ASF subversion and git services (migrated from JIRA)
Commit 1722449 from @nknize in branch 'dev/branches/branch_5x' https://svn.apache.org/r1722449
LUCENE-6908: Add space segmentation for handling irregular rectangle accuracy at the poles.
Michael McCandless (@mikemccand) (migrated from JIRA)
@nknize can this be resolved?
The
.testGeoRelations
method doesn't exactly test the behavior of GeoPoint*Query as its using the BKD split technique (instead of quad cell division) to divide the space on each pass. For "large" distance queries this can create a lot of irregular rectangles producing large radial distortion error when using the cartesian approximation methods provided byGeoUtils
. This issue improves the accuracy of GeoUtils cartesian approximation methods on irregular rectangles without having to cut over to an expensive oblate geometry approach.Migrated from LUCENE-6908 by Nick Knize (@nknize), resolved Jan 18 2016 Attachments: LUCENE-6908.patch (versions: 5)