apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.64k stars 1.02k forks source link

GeoPointDistanceQuery doesn't work with a large radius? [LUCENE-6780] #7838

Closed asfimport closed 9 years ago

asfimport commented 9 years ago

I'm working on #7756 but struggling with test failures ...

Then I noticed that TestGeoPointQuery's test never tests on large distances, so I modified the test to sometimes do so (like TestBKDTree) and hit test failures.


Migrated from LUCENE-6780 by Michael McCandless (@mikemccand), resolved Oct 23 2015 Attachments: LUCENE-6780.patch (versions: 11), LUCENE-6780-heap-used-hack.patch Linked issues:

asfimport commented 9 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Patch, putting back the smallBBox boolean ...

First problem is the test runs very slowly sometimes. I thought #7770 was supposed to fix this (so we don't need to do LUCENE-6685)?

Second problem is it then hits failures:

[junit4:pickseed] Seed property 'tests.seed' already defined: EE938F778B784339
   [junit4] <JUnit4> says hallo! Master seed: EE938F778B784339
   [junit4] Executing 1 suite with 1 JVM.
   [junit4] 
   [junit4] Started J0 PID(7661@localhost).
   [junit4] Suite: org.apache.lucene.search.TestGeoPointQuery
   [junit4]   1> T0: id=9 docID=9 lat=-76.34720742943381 lon=-36.15344492760056 deleted?=false expected=true but got false query=GeoPointDistanceQuery: field=geoField: Center: [-91.32459786462451,-62.68808104026591] Distance: 2652065.508673892 m Lower Left: [-135.1399256521194,-86.45202902120825] Upper Right: [-47.50927007712964,-38.847306191642886]
   [junit4]   2> 9月 03, 2015 11:39:46 下午 com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
   [junit4]   2> WARNING: Uncaught exception in thread: Thread[T0,5,TGRP-TestGeoPointQuery]
   [junit4]   2> java.lang.AssertionError: wrong hit
   [junit4]   2>    at __randomizedtesting.SeedInfo.seed([EE938F778B784339]:0)
   [junit4]   2>    at org.junit.Assert.fail(Assert.java:93)
   [junit4]   2>    at org.apache.lucene.search.TestGeoPointQuery$VerifyHits.test(TestGeoPointQuery.java:570)
   [junit4]   2>    at org.apache.lucene.search.TestGeoPointQuery$1._run(TestGeoPointQuery.java:511)
   [junit4]   2>    at org.apache.lucene.search.TestGeoPointQuery$1.run(TestGeoPointQuery.java:402)
   [junit4]   2> 
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeoPointQuery -Dtests.method=testRandomTiny -Dtests.seed=EE938F778B784339 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=zh_HK -Dtests.timezone=Portugal -Dtests.asserts=true -Dtests.file.encoding=UTF-8
   [junit4] ERROR   63.4s | TestGeoPointQuery.testRandomTiny <<<

Is it known/expected that the math doesn't work for large distance queries?

Finally, I noticed the randomized test is not testing "crosses dateline" cases .. is this supposed to work / known not to work? I can open a separate issue for that.

asfimport commented 9 years ago

Nick Knize (@nknize) (migrated from JIRA)

There's a small change to GeoPointTermsEnum in #7835 that this depends upon.

Apply #7835.patch then LUCENE-6780.patch

This patch also fixes an issue noted in #7756 (a side effect of simplifying the cellCrossesCircle logic)

asfimport commented 9 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Can we fix closestPointOnBBox to return void, and requiring incoming arg is non-null? I think it makes the API confusing to both take as an argument, and return, the result.

Can we expand this out to a full if statement?:

     `@Override`
+    protected short computeMaxShift() {
+      return (short)(GeoPointField.PRECISION_STEP * ((query.radius > 2000000) ? 5 : 4));
+    }

Maybe add javadoc to the new closestPointOnBBox and computeMaxShift?

In closestPointOnBBox should you maybe use Double.NaN as the marker value instead of 0.0 since 0.0 can legitimately occur?

A general geo API question: why do we sometimes use x/y (rMinX, rMinY) and other times use lon/lat (centerLon, centerLat)?

It looks like your new patch lost my original patch (that just made the test more evil) ... when I apply both, I still see test failures, e.g.:

   [junit4] Started J0 PID(390@localhost).
   [junit4] Suite: org.apache.lucene.search.TestGeoPointQuery
   [junit4]   1> T0: id=3454 docID=3389 lat=16.958122319434622 lon=91.51666186085828 deleted?=false expected=true but got false query=GeoPointDistanceQuery: field=geoField: Center: [28.3906677508417,9.349277744701268] Distance: 6883636.144942287 m Lower Left: [-33.7590864185931,-52.77655846609843] Upper Right: [90.5404219202765,71.33132841950409]
   [junit4]   2> sep 10, 2015 4:07:41 PM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
   [junit4]   2> WARNING: Uncaught exception in thread: Thread[T0,5,TGRP-TestGeoPointQuery]
   [junit4]   2> java.lang.AssertionError: wrong hit
   [junit4]   2>    at __randomizedtesting.SeedInfo.seed([94245EC72F3A129]:0)
   [junit4]   2>    at org.junit.Assert.fail(Assert.java:93)
   [junit4]   2>    at org.apache.lucene.search.TestGeoPointQuery$VerifyHits.test(TestGeoPointQuery.java:572)
   [junit4]   2>    at org.apache.lucene.search.TestGeoPointQuery$1._run(TestGeoPointQuery.java:513)
   [junit4]   2>    at org.apache.lucene.search.TestGeoPointQuery$1.run(TestGeoPointQuery.java:406)
   [junit4]   2> 
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeoPointQuery -Dtests.method=testRandom -Dtests.seed=94245EC72F3A129 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=es_MX -Dtests.timezone=Canada/Atlantic -Dtests.asserts=true -Dtests.file.encoding=UTF-8
   [junit4] ERROR   0.78s | TestGeoPointQuery.testRandom <<<

I'll attach the combined patch ...

I think a simple random test could ferret out self-consistency bugs in these new geo APIs (similar to what we did on LUCENE-6699): make a random circle and random bbox, then ask the GeoUtil APIs what their relationship is, then randomly sample points from each and confirm no point ever violates the relationship ... I'll try to code this up.

asfimport commented 9 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Patch combining @nknize's last patch and the original "make test more evil" patch.

asfimport commented 9 years ago

Nick Knize (@nknize) (migrated from JIRA)

Looks like one of my patches stepped on the evil test (blame -> git apply :) ). I'll start with the failure first:

when I apply both, I still see test failures,

A quick investigation reveals the bounding box for the point-distance slightly off (as in upwards of a degree in both directions). At present there are 2 approaches to computing that bbox:

  1. use vincenty distance to compute a location along a range (distance) bearing (azimuth). Since vincenty can fail to converge on nearly anti-podal points (hence the need for iteration thresholds and the fudge factor whack-a-mole game as seen before) this can obviously be problematic for large distance queries.
  2. inverse haversine. USGS claims an average error of 22km over large distances, and this error certainly falls within that threshold (its 15km if interested).

This fix can come in a few ways, 1. dynamically expand the computed bbox based on computed error (using distance as the independent variable). Maybe overkill? 2. add a static "fudge factor" of 1 degree to min/max lon/lat. This would probably need to be verified through some beasting.

In closestPointOnBBox should you maybe use Double.NaN as the marker value instead of 0.0 since 0.0 can legitimately occur?

The logic handles it. 0, 0 means closest point is the same as centerLon, centerLat - which is what it gets set to in the method. Though thanks for getting me to look at that closer. There's a superfluous logic block.

A general geo API question: why do we sometimes use x/y (rMinX, rMinY) and other times use lon/lat (centerLon, centerLat)?

Short answer: lazy inconsistencies. Longer answer: I like to use x/y when I'm either going to swap out with cartesian logic or I'm lazy typing. Since neither are good answers I agree it would be a good idea to refactor to lon/lat for geodesic and x/y cartesian.

Good comments on the code cleanup.

asfimport commented 9 years ago

Nick Knize (@nknize) (migrated from JIRA)

New patch that computes the BBox given the WGS84 ellipsoid. This skips all haversine and vincenty issues at the expense of a few extra trig functions (only used to compute bbox once).

The patch also addresses other comments (e.g., exploding ternary operator, adding java doc). nocommits are left alone for time being.

asfimport commented 9 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Thanks @nknize, but I'm having trouble applying the patch:

patching file lucene/sandbox/src/java/org/apache/lucene/search/GeoPointTermsEnum.java
Hunk `#1` FAILED at 42.
Hunk `#2` FAILED at 102.
Hunk `#3` FAILED at 146.
Hunk `#4` FAILED at 157.
Hunk `#5` FAILED at 185.
Hunk `#6` FAILED at 206.
6 out of 6 hunks FAILED -- saving rejects to file lucene/sandbox/src/java/org/apache/lucene/search/GeoPointTermsEnum.java.rej

It looks like your dev area isn't up to date? (Missing #7835 that I committed yesterday...)

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1702470 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1702470

LUCENE-6780: make branch

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1702471 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1702471

LUCENE-6780: commit current patch

asfimport commented 9 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

OK I attempted to sync up the patch (parts of the GeoPointTermsEnum.java changes were new, others were old) and made some other small test changes, and then committed to a branch here: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene6780

Let's iterate on this branch?

I'll work on the new randomized test ...

asfimport commented 9 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I'm seeing very slow query execution times, e.g. ant test -Dtestcase=TestGeoPointQuery -Dtestmethod=testRandomMedium -Dtests.seed=A48E30E0280A6CAF takes \~60 seconds on my box, when the test normally takes 1-2 seconds with other seeds...

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1702476 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1702476

LUCENE-6780: remove wasted annots; add failing test

asfimport commented 9 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

In closestPointOnBBox should you maybe use Double.NaN as the marker value instead of 0.0 since 0.0 can legitimately occur?

The logic handles it.

I still think there's really a bug here :) I committed a failing test on the branch to TestGeoUtils.java showing it ... maybe my test case is buggy, but all I did was translate the bbox so that it's upper right corner landed on 0, 0 ...

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1702490 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1702490

LUCENE-6780: add randomized test; remove more annots; add some nocommits; rename some radius -> radiusMeters

asfimport commented 9 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

OK I committed a new randomized test TestGeoUtils.testRandomRectsAndCircles ...

It's a simple test: make a random rect and circle, use the GeoUtils API to see if they are disjoint or if the circle fully contains the rect (else, skip it), and then randomly pick points inside the rect and confirm they are either within or not within the circle.

It sometimes fails, e.g.:

ant test  -Dtestcase=TestGeoUtils -Dtests.method=testRandomRectsAndCircles -Dtests.seed=D4C2F5423D90BDD8 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=mt_MT -Dtests.timezone=Australia/Yancowinna -Dtests.asserts=true -Dtests.file.encoding=UTF-8 -Dtests.verbose=true

But it's entirely possible we need to relax the test for boundary conditions, however when I look at the GeoUtils impls, we seem to consistently use haversine for all the checks ... but maybe even so the math can produce false errors?

asfimport commented 9 years ago

Nick Knize (@nknize) (migrated from JIRA)

Next iteration patch off the feature branch.

OK I committed a new randomized test

The test was buggy using maxLon where it expected minLon. I also updated it to handle the case where the circle is fully contained by the rectangle. All beasting passed.

I'm seeing very slow query execution times

I had noticed this before posting the initial patch and isolated one test that took \~45 secs to complete. The slow times were related to recomputing the bbox and high resolution ranges for every segment on very large distance queries. The new patch relaxes the distance criteria for the range resolution. There are still some boundary outliers that take \~15 seconds (instead of 60+) to complete. I can further improve this by optimizing the GeoPointDistanceQuery.computeBBox method... or, what are your thoughts on computing the BBox and reusing across segments??

I still think there's really a bug here I committed a failing test on the branch

bleh...indeed!! Fixed it up.

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1703062 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1703062

LUCENE-6780: fix test bug; fix bug in closestPoinOnBBox; add some nocommits

asfimport commented 9 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

The test was buggy using maxLon where it expected minLon

Duh! Thanks for fixing :)

I committed the patch w/ some minor code style changes, but added some nocommits, e.g. I'm not sure how circleFullInside testing helps since it seems to always assert to true in that case.

I'll beast the new test!

asfimport commented 9 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

The slow times were related to recomputing the bbox

Wait: we only compute this (call GeoPointDistanceQuery.computeBBox) once for the query, on init, right?

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1703281 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1703281

LUCENE-6780: let random radius be big

asfimport commented 9 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I was working on getting BKDDistanceQuery working again, but hit failures with a large radius, so I also fixed TestGeoPointQuery to use a large radius (just committed) and it causes test failures, e.g.:

[junit4:pickseed] Seed property 'tests.seed' already defined: 651AF6100346C910
   [junit4] <JUnit4> says g'day! Master seed: 651AF6100346C910
   [junit4] Executing 1 suite with 1 JVM.
   [junit4] 
   [junit4] Started J0 PID(28673@localhost).
   [junit4] Suite: org.apache.lucene.search.TestGeoPointQuery
   [junit4]   1> T0: id=5 docID=5 lat=-30.70619555695309 lon=-119.07908745059306 deleted?=false expected=false but got true query=GeoPointDistanceQuery: field=geoField: Center: [119.51155579512795,-29.46313019984308] Distance: 9353340.676650032 meters]
   [junit4]   2> Zář 15, 2015 10:00:15 ODP. com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
   [junit4]   2> WARNING: Uncaught exception in thread: Thread[T0,5,TGRP-TestGeoPointQuery]
   [junit4]   2> java.lang.AssertionError: wrong hit
   [junit4]   2>    at __randomizedtesting.SeedInfo.seed([651AF6100346C910]:0)
   [junit4]   2>    at org.junit.Assert.fail(Assert.java:93)
   [junit4]   2>    at org.apache.lucene.search.TestGeoPointQuery$VerifyHits.test(TestGeoPointQuery.java:560)
   [junit4]   2>    at org.apache.lucene.search.TestGeoPointQuery$1._run(TestGeoPointQuery.java:501)
   [junit4]   2>    at org.apache.lucene.search.TestGeoPointQuery$1.run(TestGeoPointQuery.java:396)
   [junit4]   2> 
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeoPointQuery -Dtests.method=testRandomTiny -Dtests.seed=651AF6100346C910 -Dtests.multiplier=5 -Dtests.locale=cs -Dtests.timezone=Africa/Windhoek -Dtests.asserts=true -Dtests.file.encoding=UTF-8
   [junit4] ERROR   0.24s | TestGeoPointQuery.testRandomTiny <<<
   [junit4]    > Throwable #1: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=14, name=T0, state=RUNNABLE, group=TGRP-TestGeoPointQuery]
   [junit4]    >    at __randomizedtesting.SeedInfo.seed([651AF6100346C910:2C5D28565D67F1BC]:0)
   [junit4]    > Caused by: java.lang.AssertionError: wrong hit
   [junit4]    >    at __randomizedtesting.SeedInfo.seed([651AF6100346C910]:0)
   [junit4]    >    at org.apache.lucene.search.TestGeoPointQuery$VerifyHits.test(TestGeoPointQuery.java:560)
   [junit4]    >    at org.apache.lucene.search.TestGeoPointQuery$1._run(TestGeoPointQuery.java:501)
   [junit4]    >    at org.apache.lucene.search.TestGeoPointQuery$1.run(TestGeoPointQuery.java:396)
   [junit4]   2> NOTE: test params are: codec=Asserting(Lucene53): {id=PostingsFormat(name=LuceneVarGapFixedInterval), geoField=FST50}, docValues:{id=DocValuesFormat(name=Direct), geoField=DocValuesFormat(name=Lucene50)}, sim=RandomSimilarityProvider(queryNorm=false,coord=yes): {}, locale=cs, timezone=Africa/Windhoek
   [junit4]   2> NOTE: Linux 3.13.0-61-generic amd64/Oracle Corporation 1.8.0_40 (64-bit)/cpus=8,threads=1,free=383358232,total=504889344
   [junit4]   2> NOTE: All tests run in this JVM: [TestGeoPointQuery]
   [junit4] Completed [1/1] in 0.57s, 1 test, 1 error <<< FAILURES!
   [junit4] 
   [junit4] 
   [junit4] Tests with failures:
   [junit4]   - org.apache.lucene.search.TestGeoPointQuery.testRandomTiny

A very large radius (such that it can span the entire earth) should work?

I'll also commit how far I got with the BKDDistanceQuery to the branch ...

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1703282 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1703282

7756, LUCENE-6780: add BKDDistanceQuery

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1703791 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1703791

LUCENE-6780: BKDDistanceQuery must take min/maxLon/Lat into account in hashCode and equals else we get illegal collisions in caching

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1703792 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1703792

LUCENE-6780: make tests more evil

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1703793 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1703793

LUCENE-6780: code style / nocommits

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1703794 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1703794

LUCENE-6780: remove log line

asfimport commented 9 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I'm still struggling to get BKDDistanceQuery working here, even with the good GeoUtil fixes on this branch.

So I switched back to making the GeoUtilsTest more evil:

First, I noticed that TestGeoUtils was always using a tiny lat/lon area (1-3 degrees) and fixed that to sometimes be the full range, and then I improved the "self consistency" test (TestGeoUtils.testGeoRelations) to stress the GeoUtils APIs like BKD does, and now there are some fun failures, e.g.:

   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeoUtils -Dtests.method=testGeoRelations -Dtests.seed=48F4A913CBA1095 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=es_NI -Dtests.timezone=America/Argentina/Salta -Dtests.asserts=true -Dtests.file.encoding=UTF-8
   [junit4] FAILURE 0.14s | TestGeoUtils.testGeoRelations <<<
   [junit4]    > Throwable #1: java.lang.AssertionError: 243 incorrect hits (see above)
   [junit4]    >    at __randomizedtesting.SeedInfo.seed([48F4A913CBA1095:C6AC5E24484D662B]:0)
   [junit4]    >    at org.apache.lucene.util.TestGeoUtils.testGeoRelations(TestGeoUtils.java:509)
   [junit4]    >    at java.lang.Thread.run(Thread.java:745)

If you run with -Dtests.verbose=true you see lots of diagnostics explaining which GeoUtils API went wrong... in this case, GeoUtils.rectWithinCircle returned true when it should have returned false. Now I need @nknize's help!

asfimport commented 9 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Also, the failures all seem related to using a big lon/lat area: if I hardwire the useSmallRanges to true, the test seems to pass for quite a while under beasting ...

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1703876 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1703876

LUCENE-6780: remove dup code, cleanup nocommits

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1703879 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1703879

LUCENE-6780: move GeoBBox -> GeoRect; turn off BKD verbosity

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1703881 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1703881

LUCENE-6780: merge trunk

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1704496 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1704496

LUCENE-6780: factor out base geo point test class

asfimport commented 9 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I factored out a common test base class, for TestBKDTree and TestGeoPointQuery, and in the process found a test bug in TestGeoPointQuery.bboxQueryCanBeWrong where it was being way too lenient in returning true.

I fixed that to instead check whether GeoUtils.compare returns 0.0 for any of the rect's boundaries, but now there are new test failures, e.g.:

   [junit4] Started J0 PID(11346@localhost).
   [junit4] Suite: org.apache.lucene.search.TestGeoPointQuery
   [junit4]   1> TEST-TestGeoPointQuery.testMultiValued-seed#[880EAE9C08D4DB54]: id=4591 docID=4591 should not match but did
   [junit4]   1>   rect=GeoRect(lon=-153.18749899012866 TO -152.61749242472143 lat=33.44312055800268 TO 33.95438073558091)
   [junit4]   1>   lat=32.553960682161375 lon=-154.2675789624302
   [junit4]   1>   lat=33.95438025524065 lon=-152.86846841320704
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeoPointQuery -Dtests.method=testMultiValued -Dtests.seed=880EAE9C08D4DB54 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=nl_BE -Dtests.timezone=Europe/Tallinn -Dtests.asserts=true -Dtests.file.encoding=UTF-8
   [junit4] FAILURE 10.7s | TestGeoPointQuery.testMultiValued <<<
   [junit4]    > Throwable #1: java.lang.AssertionError: some hits were wrong
   [junit4]    >    at __randomizedtesting.SeedInfo.seed([880EAE9C08D4DB54:5C2ECAAEC6169B1C]:0)
   [junit4]    >    at org.apache.lucene.util.BaseGeoPointTestCase.testMultiValued(BaseGeoPointTestCase.java:280)
   [junit4]    >    at java.lang.Thread.run(Thread.java:745)
   [junit4]   2> NOTE: test params are: codec=Asserting(Lucene53): {id=PostingsFormat(name=LuceneVarGapFixedInterval), point=PostingsFormat(name=Asserting)}, docValues:{point=DocValuesFormat(name=Asserting)}, sim=ClassicSimilarity, locale=nl_BE, timezone=Europe/Tallinn
   [junit4]   2> NOTE: Linux 3.13.0-61-generic amd64/Oracle Corporation 1.8.0_40 (64-bit)/cpus=8,threads=1,free=366930072,total=501219328
   [junit4]   2> NOTE: All tests run in this JVM: [TestGeoPointQuery]
   [junit4] Completed [1/1] in 11.10s, 1 test, 1 failure <<< FAILURES!

The failure is clearly a boundary case, so maybe we just need a better "is boundary case" check?

It's also entirely possible I created new exciting test bugs in the refactoring :)

And we still have the test failures for both BKD and GeoPoint distance queries when distance is biggish...

asfimport commented 9 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I was concerned about the peak heap used for "worst case" queries here, so I hacked up a quick patch (attached) to test this:

GeoPointInBBoxQuery: field=point: Lower Left: [-170.79577068315723,-88.3524701239041] Upper Right: [115.75692731020496,51.78004322487766]
  --> 940,015 terms = 34,065,696 bytes

GeoPointDistanceQuery: field=point: Center: [-95.87683480747508,-83.99672364681616] Distance: 826889.911703281 meters]
  --> 179,562 terms = 12,446,344 bytes

The patch just records over time the largest number of terms created by the query, and then I ran the test for many iterations.

I think this is too high, e.g. too many of these queries in flight at once can mean an unexpected OOME.

But I think before we address this we should address the correctness issues (the failing seeds for TestGeoUtils.testGeoRelations).

It could be that to fix these, we place soft limits on how large each query is allowed to be? Meaning, a user who's willing to have more error, willing to use more heap, can increase the limit if they want, but by default the limit protects the more common use case with smaller shapes.

asfimport commented 9 years ago

Nick Knize (@nknize) (migrated from JIRA)

The failure is clearly a boundary case, so maybe we just need a better "is boundary case" check?

Definitely the situation with this failure. The new GeoUtils.compare method was returning null for these boundary points so the test was always setting expected = false. I've changed it to ignore those boundary test points that are within the accepted error tolerance.

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1709065 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1709065

LUCENE-6780: fix a couple test bugs, remove nocommits, improve javadocs, cut all tests back to small==true

asfimport commented 9 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

OK I found a couple test bugs, which explained some of the failures.

I also cut all "small" booleans back to true always, so we are only testing a region b/w 1-3 degrees in lat and lon.

There is still a lurking boundary case failure, where a doc's lat/lon is on the edge of the bbox rect we query.

And then things are still sometimes very very slow, even when restricting queries to small shapes, e.g. when I run ant test -Dtestcase=TestGeoPointQuery -Dtests.seed=0, I get these times:

-test:
[junit4:pickseed] Seed property 'tests.seed' already defined: 0
   [junit4] <JUnit4> says hello! Master seed: 0
   [junit4] Executing 1 suite with 1 JVM.
   [junit4] 
   [junit4] Started J0 PID(30211@localhost).
   [junit4] Suite: org.apache.lucene.search.TestGeoPointQuery
   [junit4] OK      0.01s | TestGeoPointQuery.testInvalidGeoDistanceQuery
   [junit4] OK      0.02s | TestGeoPointQuery.testBBoxQuery
   [junit4] OK      9.10s | TestGeoPointQuery.testMultiValued
   [junit4] OK      2.07s | TestGeoPointQuery.testRandomMedium
   [junit4] OK      0.01s | TestGeoPointQuery.testGeoDistanceQueryHuge
   [junit4] OK      1.05s | TestGeoPointQuery.testRandomTiny
   [junit4] OK      0.27s | TestGeoPointQuery.testWholeMap
   [junit4] OK      0.00s | TestGeoPointQuery.testRectCrossesCircle
   [junit4] OK      2.48s | TestGeoPointQuery.testAllLonEqual
   [junit4] OK      0.00s | TestGeoPointQuery.testMortonEncoding
   [junit4] OK      13.2s | TestGeoPointQuery.testSamePointManyTimes
   [junit4] OK      0.01s | TestGeoPointQuery.testGeoDistanceQueryCrossDateline
   [junit4] IGNOR/A 0.03s | TestGeoPointQuery.testRandomBig
   [junit4]    > Assumption #1: 'nightly' test group is disabled (@Nightly())
   [junit4] OK      0.00s | TestGeoPointQuery.testPacManPolyQuery
   [junit4] OK      0.01s | TestGeoPointQuery.testMultiValuedQuery
   [junit4] OK      0.00s | TestGeoPointQuery.testPolyQuery
   [junit4] OK      0.00s | TestGeoPointQuery.testGeoDistanceQuery
   [junit4] HEARTBEAT J0 PID(30211@localhost): 2015-10-16T15:07:46, stalled for 66.9s at: TestGeoPointQuery.testAllLatEqual
   [junit4] OK      78.3s | TestGeoPointQuery.testAllLatEqual
   [junit4] OK      0.00s | TestGeoPointQuery.testInvalidBBox
   [junit4] OK      0.00s | TestGeoPointQuery.testBBoxCrossDateline
   [junit4] Completed [1/1] in 106.94s, 20 tests, 1 skipped
   [junit4] 
   [junit4] JVM J0:     0.36 ..   107.82 =   107.47s
   [junit4] Execution time total: 1 minute 47 seconds
   [junit4] Tests summary: 1 suite, 20 tests, 1 ignored (1 assumption)
     [echo] 5 slowest tests:

Why is testAllLatEqual so slow? This test case came from BKD's test (since we refactored to a common base test class) ... I just don't understand why it's so slow for geo point query.

asfimport commented 9 years ago

Nick Knize (@nknize) (migrated from JIRA)

Next WIP patch...

This patch includes the following:

  1. Throws an IllegalArgumentException for PointDistanceQuery when the radius exceeds the maximum allowable distance (based on location on the earth)
  2. Reduces resolution for extreme GeoPointInBBoxQuery cases (based on diagonal distance of the bbox).

Performance is much better (need to quantify through benchmarks) but there are still a few Test bugs that need to be ironed out (PointInPolygon tolerance, and catching illegal radius values for PointDistanceQuery)

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1709253 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1709253

LUCENE-6780: don't allow too-large radius in distance query (depending on origin); reduce resolution for large bbox

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1709257 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1709257

LUCENE-6780: fix bug in midLat/midLon; don't test dateline crossing when small == true

asfimport commented 9 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Thanks @nknize, I committed the patch, but put a nocommit on the AwaitsFix (we plan to fix that before landing this right?).

I also found a bug in how midLat/Lon was computed, that was causing us to almost always use shiftFactor=5 even when the bbox was smallish ... I fix that, and also fixed the test not to do dateline crossing when small==true.

However ant test -Dtestcase=TestGeoPointQuery -Dtestmethod=testAllLatEqual -Dtests.seed=0 is still quite slow (\~5.6 seconds on my fastish haswell box), even though it's always using a small bbox ....

asfimport commented 9 years ago

Nick Knize (@nknize) (migrated from JIRA)

Updated patch.

asfimport commented 9 years ago

Nick Knize (@nknize) (migrated from JIRA)

git svn is behind by 3 days... here's the patch against the latest svn repo.

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1709483 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1709483

LUCENE-6780: mark slow tests nightly, turn back on some big bbox/distances in tests, use GeoUtils.bboxContains

asfimport commented 9 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

git svn is behind by 3 days.

Not good!

Thanks @nknize, I committed the last patch, but I hit this test failure in the new "geo API self-consistency" test:

   [junit4] Started J0 PID(17316@localhost).
   [junit4] Suite: org.apache.lucene.util.TestGeoUtils
   [junit4]   1> doc=113 did not match but should
   [junit4]   1>   lon=19.101526554407 lat=-85.2135864567328 distanceMeters=220459.12109877667 vs radiusMeters=220827.9418903528
   [junit4]   1> doc=814 did not match but should
   [junit4]   1>   lon=18.946167779153978 lat=-85.21667574378826 distanceMeters=220122.387740399 vs radiusMeters=220827.9418903528
   [junit4]   1> doc=1340 did not match but should
   [junit4]   1>   lon=19.231482133213458 lat=-85.21195787282232 distanceMeters=220639.0006665368 vs radiusMeters=220827.9418903528
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeoUtils -Dtests.method=testGeoRelations -Dtests.seed=A1A8548928708420 -Dtests.locale=en_PH -Dtests.timezone=Africa/Mogadishu -Dtests.asserts=true -Dtests.file.encoding=UTF-8
   [junit4] FAILURE 0.09s | TestGeoUtils.testGeoRelations <<<
   [junit4]    > Throwable #1: java.lang.AssertionError: 3 incorrect hits (see above)
   [junit4]    >    at __randomizedtesting.SeedInfo.seed([A1A8548928708420:638B403C5C87F29E]:0)
   [junit4]    >    at org.apache.lucene.util.TestGeoUtils.testGeoRelations(TestGeoUtils.java:506)
   [junit4]    >    at java.lang.Thread.run(Thread.java:745)
   [junit4]   2> NOTE: test params are: codec=FastCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=FAST, chunkSize=25153, maxDocsPerChunk=7, blockSize=932), termVectorsFormat=CompressingTermVectorsFormat(compressionMode=FAST, chunkSize=25153, blockSize=932)), sim=ClassicSimilarity, locale=en_PH, timezone=Africa/Mogadishu
   [junit4]   2> NOTE: Linux 3.13.0-61-generic amd64/Oracle Corporation 1.8.0_60 (64-bit)/cpus=8,threads=1,free=422996528,total=504889344
   [junit4]   2> NOTE: All tests run in this JVM: [TestGeoUtils]
   [junit4] Completed [1/1] in 0.24s, 1 test, 1 failure <<< FAILURES!

Seems like boundary cases because the actual distance vs query distance are pretty close?

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1709485 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1709485

LUCENE-6780: disable test until we can fix pole crossing correctly

asfimport commented 9 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

OK I disabled the new test case until we can fix pole crossing.

But I hit a new failure:

....I.I.......T2: id=2929 should not match but did
  small=true query=GeoPointInBBoxQuery: field=point: Lower Left: [-96.30716303968933,35.304955180599926] Upper Right: [-95.78797371671509,35.55537284046483] docID=2929
  lat=35.30495325717168 lon=-95.95683785493188
  deleted?=false
okt 19, 2015 5:00:44 PM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
WARNING: Uncaught exception in thread: Thread[T2,5,TGRP-TestGeoPointQuery]
java.lang.AssertionError: some hits were wrong
    at __randomizedtesting.SeedInfo.seed([F82368891A1F01C9]:0)
    at org.junit.Assert.fail(Assert.java:93)
    at org.apache.lucene.util.BaseGeoPointTestCase$VerifyHits.test(BaseGeoPointTestCase.java:509)
    at org.apache.lucene.util.BaseGeoPointTestCase$2._run(BaseGeoPointTestCase.java:697)
    at org.apache.lucene.util.BaseGeoPointTestCase$2.run(BaseGeoPointTestCase.java:580)

ENOTE: reproduce with: ant test  -Dtestcase=TestGeoPointQuery -Dtests.method=testRandomMedium -Dtests.seed=F82368891A1F01C9 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=sk_SK -Dtests.timezone=America/Knox_IN -Dtests.asserts=true -Dtests.file.encoding=UTF-8
....I.....NOTE: test params are: codec=Asserting(Lucene53): {id=PostingsFormat(name=Memory doPackFST= true), point=PostingsFormat(name=Asserting)}, docValues:{id=DocValuesFormat(name=Lucene50), point=DocValuesFormat(name=Lucene50)}, sim=ClassicSimilarity, locale=sk_SK, timezone=America/Knox_IN
NOTE: Linux 3.13.0-61-generic amd64/Oracle Corporation 1.8.0_60 (64-bit)/cpus=8,threads=1,free=488560232,total=514850816
NOTE: All tests run in this JVM: [TestGeoPointQuery]

Looks like a boundary case ...

asfimport commented 9 years ago

Nick Knize (@nknize) (migrated from JIRA)

Updated patch:

I think this is ready and we can move on to fixing performance by removing some of these transient structures (e.g., rangeList)

asfimport commented 9 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

+1 to the last patch, thanks @nknize!

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1709892 from @nknize in branch 'dev/branches/lucene6780' https://svn.apache.org/r1709892

LUCENE-6780: fix quantization in random lat/lon generation, cut overly adversarial tests to @Nightly