Closed asfimport closed 8 years ago
Michael McCandless (@mikemccand) (migrated from JIRA)
Patch, putting back the smallBBox
boolean ...
First problem is the test runs very slowly sometimes. I thought #7770 was supposed to fix this (so we don't need to do LUCENE-6685)?
Second problem is it then hits failures:
[junit4:pickseed] Seed property 'tests.seed' already defined: EE938F778B784339
[junit4] <JUnit4> says hallo! Master seed: EE938F778B784339
[junit4] Executing 1 suite with 1 JVM.
[junit4]
[junit4] Started J0 PID(7661@localhost).
[junit4] Suite: org.apache.lucene.search.TestGeoPointQuery
[junit4] 1> T0: id=9 docID=9 lat=-76.34720742943381 lon=-36.15344492760056 deleted?=false expected=true but got false query=GeoPointDistanceQuery: field=geoField: Center: [-91.32459786462451,-62.68808104026591] Distance: 2652065.508673892 m Lower Left: [-135.1399256521194,-86.45202902120825] Upper Right: [-47.50927007712964,-38.847306191642886]
[junit4] 2> 9月 03, 2015 11:39:46 下午 com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
[junit4] 2> WARNING: Uncaught exception in thread: Thread[T0,5,TGRP-TestGeoPointQuery]
[junit4] 2> java.lang.AssertionError: wrong hit
[junit4] 2> at __randomizedtesting.SeedInfo.seed([EE938F778B784339]:0)
[junit4] 2> at org.junit.Assert.fail(Assert.java:93)
[junit4] 2> at org.apache.lucene.search.TestGeoPointQuery$VerifyHits.test(TestGeoPointQuery.java:570)
[junit4] 2> at org.apache.lucene.search.TestGeoPointQuery$1._run(TestGeoPointQuery.java:511)
[junit4] 2> at org.apache.lucene.search.TestGeoPointQuery$1.run(TestGeoPointQuery.java:402)
[junit4] 2>
[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeoPointQuery -Dtests.method=testRandomTiny -Dtests.seed=EE938F778B784339 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=zh_HK -Dtests.timezone=Portugal -Dtests.asserts=true -Dtests.file.encoding=UTF-8
[junit4] ERROR 63.4s | TestGeoPointQuery.testRandomTiny <<<
Is it known/expected that the math doesn't work for large distance queries?
Finally, I noticed the randomized test is not testing "crosses dateline" cases .. is this supposed to work / known not to work? I can open a separate issue for that.
Nick Knize (@nknize) (migrated from JIRA)
There's a small change to GeoPointTermsEnum in #7835 that this depends upon.
Apply #7835.patch then LUCENE-6780.patch
This patch also fixes an issue noted in #7756 (a side effect of simplifying the cellCrossesCircle logic)
Michael McCandless (@mikemccand) (migrated from JIRA)
Can we fix closestPointOnBBox
to return void
, and requiring incoming arg is non-null? I think it makes the API confusing to both take as an argument, and return, the result.
Can we expand this out to a full if statement?:
`@Override`
+ protected short computeMaxShift() {
+ return (short)(GeoPointField.PRECISION_STEP * ((query.radius > 2000000) ? 5 : 4));
+ }
Maybe add javadoc to the new closestPointOnBBox
and computeMaxShift
?
In closestPointOnBBox
should you maybe use Double.NaN as the marker value instead of 0.0 since 0.0 can legitimately occur?
A general geo API question: why do we sometimes use x/y (rMinX
, rMinY
) and other times use lon/lat (centerLon
, centerLat
)?
It looks like your new patch lost my original patch (that just made the test more evil) ... when I apply both, I still see test failures, e.g.:
[junit4] Started J0 PID(390@localhost).
[junit4] Suite: org.apache.lucene.search.TestGeoPointQuery
[junit4] 1> T0: id=3454 docID=3389 lat=16.958122319434622 lon=91.51666186085828 deleted?=false expected=true but got false query=GeoPointDistanceQuery: field=geoField: Center: [28.3906677508417,9.349277744701268] Distance: 6883636.144942287 m Lower Left: [-33.7590864185931,-52.77655846609843] Upper Right: [90.5404219202765,71.33132841950409]
[junit4] 2> sep 10, 2015 4:07:41 PM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
[junit4] 2> WARNING: Uncaught exception in thread: Thread[T0,5,TGRP-TestGeoPointQuery]
[junit4] 2> java.lang.AssertionError: wrong hit
[junit4] 2> at __randomizedtesting.SeedInfo.seed([94245EC72F3A129]:0)
[junit4] 2> at org.junit.Assert.fail(Assert.java:93)
[junit4] 2> at org.apache.lucene.search.TestGeoPointQuery$VerifyHits.test(TestGeoPointQuery.java:572)
[junit4] 2> at org.apache.lucene.search.TestGeoPointQuery$1._run(TestGeoPointQuery.java:513)
[junit4] 2> at org.apache.lucene.search.TestGeoPointQuery$1.run(TestGeoPointQuery.java:406)
[junit4] 2>
[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeoPointQuery -Dtests.method=testRandom -Dtests.seed=94245EC72F3A129 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=es_MX -Dtests.timezone=Canada/Atlantic -Dtests.asserts=true -Dtests.file.encoding=UTF-8
[junit4] ERROR 0.78s | TestGeoPointQuery.testRandom <<<
I'll attach the combined patch ...
I think a simple random test could ferret out self-consistency bugs in these new geo APIs (similar to what we did on LUCENE-6699): make a random circle and random bbox, then ask the GeoUtil APIs what their relationship is, then randomly sample points from each and confirm no point ever violates the relationship ... I'll try to code this up.
Michael McCandless (@mikemccand) (migrated from JIRA)
Patch combining @nknize's last patch and the original "make test more evil" patch.
Nick Knize (@nknize) (migrated from JIRA)
Looks like one of my patches stepped on the evil test (blame -> git apply :) ). I'll start with the failure first:
when I apply both, I still see test failures,
A quick investigation reveals the bounding box for the point-distance slightly off (as in upwards of a degree in both directions). At present there are 2 approaches to computing that bbox:
This fix can come in a few ways, 1. dynamically expand the computed bbox based on computed error (using distance as the independent variable). Maybe overkill? 2. add a static "fudge factor" of 1 degree to min/max lon/lat. This would probably need to be verified through some beasting.
In closestPointOnBBox should you maybe use Double.NaN as the marker value instead of 0.0 since 0.0 can legitimately occur?
The logic handles it. 0, 0 means closest point is the same as centerLon, centerLat - which is what it gets set to in the method. Though thanks for getting me to look at that closer. There's a superfluous logic block.
A general geo API question: why do we sometimes use x/y (rMinX, rMinY) and other times use lon/lat (centerLon, centerLat)?
Short answer: lazy inconsistencies. Longer answer: I like to use x/y when I'm either going to swap out with cartesian logic or I'm lazy typing. Since neither are good answers I agree it would be a good idea to refactor to lon/lat for geodesic and x/y cartesian.
Good comments on the code cleanup.
Nick Knize (@nknize) (migrated from JIRA)
New patch that computes the BBox given the WGS84 ellipsoid. This skips all haversine and vincenty issues at the expense of a few extra trig functions (only used to compute bbox once).
The patch also addresses other comments (e.g., exploding ternary operator, adding java doc). nocommits are left alone for time being.
Michael McCandless (@mikemccand) (migrated from JIRA)
Thanks @nknize, but I'm having trouble applying the patch:
patching file lucene/sandbox/src/java/org/apache/lucene/search/GeoPointTermsEnum.java
Hunk `#1` FAILED at 42.
Hunk `#2` FAILED at 102.
Hunk `#3` FAILED at 146.
Hunk `#4` FAILED at 157.
Hunk `#5` FAILED at 185.
Hunk `#6` FAILED at 206.
6 out of 6 hunks FAILED -- saving rejects to file lucene/sandbox/src/java/org/apache/lucene/search/GeoPointTermsEnum.java.rej
It looks like your dev area isn't up to date? (Missing #7835 that I committed yesterday...)
ASF subversion and git services (migrated from JIRA)
Commit 1702470 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1702470
LUCENE-6780: make branch
ASF subversion and git services (migrated from JIRA)
Commit 1702471 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1702471
LUCENE-6780: commit current patch
Michael McCandless (@mikemccand) (migrated from JIRA)
OK I attempted to sync up the patch (parts of the GeoPointTermsEnum.java
changes were new, others were old) and made some other small test changes, and then committed to a branch here: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene6780
Let's iterate on this branch?
I'll work on the new randomized test ...
Michael McCandless (@mikemccand) (migrated from JIRA)
I'm seeing very slow query execution times, e.g. ant test -Dtestcase=TestGeoPointQuery -Dtestmethod=testRandomMedium -Dtests.seed=A48E30E0280A6CAF
takes \~60 seconds on my box, when the test normally takes 1-2 seconds with other seeds...
ASF subversion and git services (migrated from JIRA)
Commit 1702476 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1702476
LUCENE-6780: remove wasted annots; add failing test
Michael McCandless (@mikemccand) (migrated from JIRA)
In closestPointOnBBox should you maybe use Double.NaN as the marker value instead of 0.0 since 0.0 can legitimately occur?
The logic handles it.
I still think there's really a bug here :) I committed a failing test on the branch to TestGeoUtils.java
showing it ... maybe my test case is buggy, but all I did was translate the bbox so that it's upper right corner landed on 0, 0 ...
ASF subversion and git services (migrated from JIRA)
Commit 1702490 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1702490
LUCENE-6780: add randomized test; remove more annots; add some nocommits; rename some radius -> radiusMeters
Michael McCandless (@mikemccand) (migrated from JIRA)
OK I committed a new randomized test TestGeoUtils.testRandomRectsAndCircles
...
It's a simple test: make a random rect and circle, use the GeoUtils API to see if they are disjoint or if the circle fully contains the rect (else, skip it), and then randomly pick points inside the rect and confirm they are either within or not within the circle.
It sometimes fails, e.g.:
ant test -Dtestcase=TestGeoUtils -Dtests.method=testRandomRectsAndCircles -Dtests.seed=D4C2F5423D90BDD8 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=mt_MT -Dtests.timezone=Australia/Yancowinna -Dtests.asserts=true -Dtests.file.encoding=UTF-8 -Dtests.verbose=true
But it's entirely possible we need to relax the test for boundary conditions, however when I look at the GeoUtils impls, we seem to consistently use haversine for all the checks ... but maybe even so the math can produce false errors?
Nick Knize (@nknize) (migrated from JIRA)
Next iteration patch off the feature branch.
OK I committed a new randomized test
The test was buggy using maxLon where it expected minLon. I also updated it to handle the case where the circle is fully contained by the rectangle. All beasting passed.
I'm seeing very slow query execution times
I had noticed this before posting the initial patch and isolated one test that took \~45 secs to complete. The slow times were related to recomputing the bbox and high resolution ranges for every segment on very large distance queries. The new patch relaxes the distance criteria for the range resolution. There are still some boundary outliers that take \~15 seconds (instead of 60+) to complete. I can further improve this by optimizing the GeoPointDistanceQuery.computeBBox method... or, what are your thoughts on computing the BBox and reusing across segments??
I still think there's really a bug here I committed a failing test on the branch
bleh...indeed!! Fixed it up.
ASF subversion and git services (migrated from JIRA)
Commit 1703062 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1703062
LUCENE-6780: fix test bug; fix bug in closestPoinOnBBox; add some nocommits
Michael McCandless (@mikemccand) (migrated from JIRA)
The test was buggy using maxLon where it expected minLon
Duh! Thanks for fixing :)
I committed the patch w/ some minor code style changes, but added some nocommits, e.g. I'm not sure how circleFullInside
testing helps since it seems to always assert to true in that case.
I'll beast the new test!
Michael McCandless (@mikemccand) (migrated from JIRA)
The slow times were related to recomputing the bbox
Wait: we only compute this (call GeoPointDistanceQuery.computeBBox
) once for the query, on init, right?
ASF subversion and git services (migrated from JIRA)
Commit 1703281 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1703281
LUCENE-6780: let random radius be big
Michael McCandless (@mikemccand) (migrated from JIRA)
I was working on getting BKDDistanceQuery
working again, but hit failures with a large radius, so I also fixed TestGeoPointQuery
to use a large radius (just committed) and it causes test failures, e.g.:
[junit4:pickseed] Seed property 'tests.seed' already defined: 651AF6100346C910
[junit4] <JUnit4> says g'day! Master seed: 651AF6100346C910
[junit4] Executing 1 suite with 1 JVM.
[junit4]
[junit4] Started J0 PID(28673@localhost).
[junit4] Suite: org.apache.lucene.search.TestGeoPointQuery
[junit4] 1> T0: id=5 docID=5 lat=-30.70619555695309 lon=-119.07908745059306 deleted?=false expected=false but got true query=GeoPointDistanceQuery: field=geoField: Center: [119.51155579512795,-29.46313019984308] Distance: 9353340.676650032 meters]
[junit4] 2> Zář 15, 2015 10:00:15 ODP. com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
[junit4] 2> WARNING: Uncaught exception in thread: Thread[T0,5,TGRP-TestGeoPointQuery]
[junit4] 2> java.lang.AssertionError: wrong hit
[junit4] 2> at __randomizedtesting.SeedInfo.seed([651AF6100346C910]:0)
[junit4] 2> at org.junit.Assert.fail(Assert.java:93)
[junit4] 2> at org.apache.lucene.search.TestGeoPointQuery$VerifyHits.test(TestGeoPointQuery.java:560)
[junit4] 2> at org.apache.lucene.search.TestGeoPointQuery$1._run(TestGeoPointQuery.java:501)
[junit4] 2> at org.apache.lucene.search.TestGeoPointQuery$1.run(TestGeoPointQuery.java:396)
[junit4] 2>
[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeoPointQuery -Dtests.method=testRandomTiny -Dtests.seed=651AF6100346C910 -Dtests.multiplier=5 -Dtests.locale=cs -Dtests.timezone=Africa/Windhoek -Dtests.asserts=true -Dtests.file.encoding=UTF-8
[junit4] ERROR 0.24s | TestGeoPointQuery.testRandomTiny <<<
[junit4] > Throwable #1: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=14, name=T0, state=RUNNABLE, group=TGRP-TestGeoPointQuery]
[junit4] > at __randomizedtesting.SeedInfo.seed([651AF6100346C910:2C5D28565D67F1BC]:0)
[junit4] > Caused by: java.lang.AssertionError: wrong hit
[junit4] > at __randomizedtesting.SeedInfo.seed([651AF6100346C910]:0)
[junit4] > at org.apache.lucene.search.TestGeoPointQuery$VerifyHits.test(TestGeoPointQuery.java:560)
[junit4] > at org.apache.lucene.search.TestGeoPointQuery$1._run(TestGeoPointQuery.java:501)
[junit4] > at org.apache.lucene.search.TestGeoPointQuery$1.run(TestGeoPointQuery.java:396)
[junit4] 2> NOTE: test params are: codec=Asserting(Lucene53): {id=PostingsFormat(name=LuceneVarGapFixedInterval), geoField=FST50}, docValues:{id=DocValuesFormat(name=Direct), geoField=DocValuesFormat(name=Lucene50)}, sim=RandomSimilarityProvider(queryNorm=false,coord=yes): {}, locale=cs, timezone=Africa/Windhoek
[junit4] 2> NOTE: Linux 3.13.0-61-generic amd64/Oracle Corporation 1.8.0_40 (64-bit)/cpus=8,threads=1,free=383358232,total=504889344
[junit4] 2> NOTE: All tests run in this JVM: [TestGeoPointQuery]
[junit4] Completed [1/1] in 0.57s, 1 test, 1 error <<< FAILURES!
[junit4]
[junit4]
[junit4] Tests with failures:
[junit4] - org.apache.lucene.search.TestGeoPointQuery.testRandomTiny
A very large radius (such that it can span the entire earth) should work?
I'll also commit how far I got with the BKDDistanceQuery
to the branch ...
ASF subversion and git services (migrated from JIRA)
Commit 1703282 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1703282
ASF subversion and git services (migrated from JIRA)
Commit 1703791 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1703791
LUCENE-6780: BKDDistanceQuery must take min/maxLon/Lat into account in hashCode and equals else we get illegal collisions in caching
ASF subversion and git services (migrated from JIRA)
Commit 1703792 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1703792
LUCENE-6780: make tests more evil
ASF subversion and git services (migrated from JIRA)
Commit 1703793 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1703793
LUCENE-6780: code style / nocommits
ASF subversion and git services (migrated from JIRA)
Commit 1703794 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1703794
LUCENE-6780: remove log line
Michael McCandless (@mikemccand) (migrated from JIRA)
I'm still struggling to get BKDDistanceQuery
working here, even with the good GeoUtil fixes on this branch.
So I switched back to making the GeoUtilsTest
more evil:
First, I noticed that TestGeoUtils
was always using a tiny lat/lon area (1-3 degrees) and fixed that to sometimes be the full range, and then I improved the "self consistency" test (TestGeoUtils.testGeoRelations
) to stress the GeoUtils
APIs like BKD does, and now there are some fun failures, e.g.:
[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeoUtils -Dtests.method=testGeoRelations -Dtests.seed=48F4A913CBA1095 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=es_NI -Dtests.timezone=America/Argentina/Salta -Dtests.asserts=true -Dtests.file.encoding=UTF-8
[junit4] FAILURE 0.14s | TestGeoUtils.testGeoRelations <<<
[junit4] > Throwable #1: java.lang.AssertionError: 243 incorrect hits (see above)
[junit4] > at __randomizedtesting.SeedInfo.seed([48F4A913CBA1095:C6AC5E24484D662B]:0)
[junit4] > at org.apache.lucene.util.TestGeoUtils.testGeoRelations(TestGeoUtils.java:509)
[junit4] > at java.lang.Thread.run(Thread.java:745)
If you run with -Dtests.verbose=true
you see lots of diagnostics explaining which GeoUtils
API went wrong... in this case, GeoUtils.rectWithinCircle
returned true when it should have returned false. Now I need @nknize's help!
Michael McCandless (@mikemccand) (migrated from JIRA)
Also, the failures all seem related to using a big lon/lat area: if I hardwire the useSmallRanges
to true
, the test seems to pass for quite a while under beasting ...
ASF subversion and git services (migrated from JIRA)
Commit 1703876 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1703876
LUCENE-6780: remove dup code, cleanup nocommits
ASF subversion and git services (migrated from JIRA)
Commit 1703879 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1703879
LUCENE-6780: move GeoBBox -> GeoRect; turn off BKD verbosity
ASF subversion and git services (migrated from JIRA)
Commit 1703881 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1703881
LUCENE-6780: merge trunk
ASF subversion and git services (migrated from JIRA)
Commit 1704496 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1704496
LUCENE-6780: factor out base geo point test class
Michael McCandless (@mikemccand) (migrated from JIRA)
I factored out a common test base class, for TestBKDTree and TestGeoPointQuery, and in the process found a test bug in TestGeoPointQuery.bboxQueryCanBeWrong
where it was being way too lenient in returning true
.
I fixed that to instead check whether GeoUtils.compare
returns 0.0 for any of the rect's boundaries, but now there are new test failures, e.g.:
[junit4] Started J0 PID(11346@localhost).
[junit4] Suite: org.apache.lucene.search.TestGeoPointQuery
[junit4] 1> TEST-TestGeoPointQuery.testMultiValued-seed#[880EAE9C08D4DB54]: id=4591 docID=4591 should not match but did
[junit4] 1> rect=GeoRect(lon=-153.18749899012866 TO -152.61749242472143 lat=33.44312055800268 TO 33.95438073558091)
[junit4] 1> lat=32.553960682161375 lon=-154.2675789624302
[junit4] 1> lat=33.95438025524065 lon=-152.86846841320704
[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeoPointQuery -Dtests.method=testMultiValued -Dtests.seed=880EAE9C08D4DB54 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=nl_BE -Dtests.timezone=Europe/Tallinn -Dtests.asserts=true -Dtests.file.encoding=UTF-8
[junit4] FAILURE 10.7s | TestGeoPointQuery.testMultiValued <<<
[junit4] > Throwable #1: java.lang.AssertionError: some hits were wrong
[junit4] > at __randomizedtesting.SeedInfo.seed([880EAE9C08D4DB54:5C2ECAAEC6169B1C]:0)
[junit4] > at org.apache.lucene.util.BaseGeoPointTestCase.testMultiValued(BaseGeoPointTestCase.java:280)
[junit4] > at java.lang.Thread.run(Thread.java:745)
[junit4] 2> NOTE: test params are: codec=Asserting(Lucene53): {id=PostingsFormat(name=LuceneVarGapFixedInterval), point=PostingsFormat(name=Asserting)}, docValues:{point=DocValuesFormat(name=Asserting)}, sim=ClassicSimilarity, locale=nl_BE, timezone=Europe/Tallinn
[junit4] 2> NOTE: Linux 3.13.0-61-generic amd64/Oracle Corporation 1.8.0_40 (64-bit)/cpus=8,threads=1,free=366930072,total=501219328
[junit4] 2> NOTE: All tests run in this JVM: [TestGeoPointQuery]
[junit4] Completed [1/1] in 11.10s, 1 test, 1 failure <<< FAILURES!
The failure is clearly a boundary case, so maybe we just need a better "is boundary case" check?
It's also entirely possible I created new exciting test bugs in the refactoring :)
And we still have the test failures for both BKD and GeoPoint distance queries when distance is biggish...
Michael McCandless (@mikemccand) (migrated from JIRA)
I was concerned about the peak heap used for "worst case" queries here, so I hacked up a quick patch (attached) to test this:
GeoPointInBBoxQuery: field=point: Lower Left: [-170.79577068315723,-88.3524701239041] Upper Right: [115.75692731020496,51.78004322487766]
--> 940,015 terms = 34,065,696 bytes
GeoPointDistanceQuery: field=point: Center: [-95.87683480747508,-83.99672364681616] Distance: 826889.911703281 meters]
--> 179,562 terms = 12,446,344 bytes
The patch just records over time the largest number of terms created by the query, and then I ran the test for many iterations.
I think this is too high, e.g. too many of these queries in flight at once can mean an unexpected OOME.
But I think before we address this we should address the correctness issues (the failing seeds for TestGeoUtils.testGeoRelations
).
It could be that to fix these, we place soft limits on how large each query is allowed to be? Meaning, a user who's willing to have more error, willing to use more heap, can increase the limit if they want, but by default the limit protects the more common use case with smaller shapes.
Nick Knize (@nknize) (migrated from JIRA)
The failure is clearly a boundary case, so maybe we just need a better "is boundary case" check?
Definitely the situation with this failure. The new GeoUtils.compare method was returning null for these boundary points so the test was always setting expected = false. I've changed it to ignore those boundary test points that are within the accepted error tolerance.
ASF subversion and git services (migrated from JIRA)
Commit 1709065 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1709065
LUCENE-6780: fix a couple test bugs, remove nocommits, improve javadocs, cut all tests back to small==true
Michael McCandless (@mikemccand) (migrated from JIRA)
OK I found a couple test bugs, which explained some of the failures.
I also cut all "small" booleans back to true always, so we are only testing a region b/w 1-3 degrees in lat and lon.
There is still a lurking boundary case failure, where a doc's lat/lon is on the edge of the bbox rect we query.
And then things are still sometimes very very slow, even when restricting queries to small shapes, e.g. when I run ant test -Dtestcase=TestGeoPointQuery -Dtests.seed=0
, I get these times:
-test:
[junit4:pickseed] Seed property 'tests.seed' already defined: 0
[junit4] <JUnit4> says hello! Master seed: 0
[junit4] Executing 1 suite with 1 JVM.
[junit4]
[junit4] Started J0 PID(30211@localhost).
[junit4] Suite: org.apache.lucene.search.TestGeoPointQuery
[junit4] OK 0.01s | TestGeoPointQuery.testInvalidGeoDistanceQuery
[junit4] OK 0.02s | TestGeoPointQuery.testBBoxQuery
[junit4] OK 9.10s | TestGeoPointQuery.testMultiValued
[junit4] OK 2.07s | TestGeoPointQuery.testRandomMedium
[junit4] OK 0.01s | TestGeoPointQuery.testGeoDistanceQueryHuge
[junit4] OK 1.05s | TestGeoPointQuery.testRandomTiny
[junit4] OK 0.27s | TestGeoPointQuery.testWholeMap
[junit4] OK 0.00s | TestGeoPointQuery.testRectCrossesCircle
[junit4] OK 2.48s | TestGeoPointQuery.testAllLonEqual
[junit4] OK 0.00s | TestGeoPointQuery.testMortonEncoding
[junit4] OK 13.2s | TestGeoPointQuery.testSamePointManyTimes
[junit4] OK 0.01s | TestGeoPointQuery.testGeoDistanceQueryCrossDateline
[junit4] IGNOR/A 0.03s | TestGeoPointQuery.testRandomBig
[junit4] > Assumption #1: 'nightly' test group is disabled (@Nightly())
[junit4] OK 0.00s | TestGeoPointQuery.testPacManPolyQuery
[junit4] OK 0.01s | TestGeoPointQuery.testMultiValuedQuery
[junit4] OK 0.00s | TestGeoPointQuery.testPolyQuery
[junit4] OK 0.00s | TestGeoPointQuery.testGeoDistanceQuery
[junit4] HEARTBEAT J0 PID(30211@localhost): 2015-10-16T15:07:46, stalled for 66.9s at: TestGeoPointQuery.testAllLatEqual
[junit4] OK 78.3s | TestGeoPointQuery.testAllLatEqual
[junit4] OK 0.00s | TestGeoPointQuery.testInvalidBBox
[junit4] OK 0.00s | TestGeoPointQuery.testBBoxCrossDateline
[junit4] Completed [1/1] in 106.94s, 20 tests, 1 skipped
[junit4]
[junit4] JVM J0: 0.36 .. 107.82 = 107.47s
[junit4] Execution time total: 1 minute 47 seconds
[junit4] Tests summary: 1 suite, 20 tests, 1 ignored (1 assumption)
[echo] 5 slowest tests:
Why is testAllLatEqual
so slow? This test case came from BKD's test (since we refactored to a common base test class) ... I just don't understand why it's so slow for geo point query.
Nick Knize (@nknize) (migrated from JIRA)
Next WIP patch...
This patch includes the following:
Performance is much better (need to quantify through benchmarks) but there are still a few Test bugs that need to be ironed out (PointInPolygon tolerance, and catching illegal radius values for PointDistanceQuery)
ASF subversion and git services (migrated from JIRA)
Commit 1709253 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1709253
LUCENE-6780: don't allow too-large radius in distance query (depending on origin); reduce resolution for large bbox
ASF subversion and git services (migrated from JIRA)
Commit 1709257 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1709257
LUCENE-6780: fix bug in midLat/midLon; don't test dateline crossing when small == true
Michael McCandless (@mikemccand) (migrated from JIRA)
Thanks @nknize, I committed the patch, but put a nocommit on the AwaitsFix (we plan to fix that before landing this right?).
I also found a bug in how midLat/Lon was computed, that was causing us to almost always use shiftFactor=5 even when the bbox was smallish ... I fix that, and also fixed the test not to do dateline crossing when small==true.
However ant test -Dtestcase=TestGeoPointQuery -Dtestmethod=testAllLatEqual -Dtests.seed=0
is still quite slow (\~5.6 seconds on my fastish haswell box), even though it's always using a small bbox ....
Nick Knize (@nknize) (migrated from JIRA)
Updated patch.
Adds Nightly to a few of the tests that still seem to run slow during range creation (particularly on PointInPolygon queries). IMHO this shouldn't preclude the fixes and improvements added by this patch. Will open separate issues to a. investigate performance of range/polygon intersection (probably needs to be handled under the dateline crossing feature) and b. create ranges on-demand instead of using transient memory (improve overall performance)
Added consistency to rectContainsPoint to use same method as GeoUtils.bboxContains
Nick Knize (@nknize) (migrated from JIRA)
git svn is behind by 3 days... here's the patch against the latest svn repo.
ASF subversion and git services (migrated from JIRA)
Commit 1709483 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1709483
LUCENE-6780: mark slow tests nightly, turn back on some big bbox/distances in tests, use GeoUtils.bboxContains
Michael McCandless (@mikemccand) (migrated from JIRA)
git svn is behind by 3 days.
Not good!
Thanks @nknize, I committed the last patch, but I hit this test failure in the new "geo API self-consistency" test:
[junit4] Started J0 PID(17316@localhost).
[junit4] Suite: org.apache.lucene.util.TestGeoUtils
[junit4] 1> doc=113 did not match but should
[junit4] 1> lon=19.101526554407 lat=-85.2135864567328 distanceMeters=220459.12109877667 vs radiusMeters=220827.9418903528
[junit4] 1> doc=814 did not match but should
[junit4] 1> lon=18.946167779153978 lat=-85.21667574378826 distanceMeters=220122.387740399 vs radiusMeters=220827.9418903528
[junit4] 1> doc=1340 did not match but should
[junit4] 1> lon=19.231482133213458 lat=-85.21195787282232 distanceMeters=220639.0006665368 vs radiusMeters=220827.9418903528
[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeoUtils -Dtests.method=testGeoRelations -Dtests.seed=A1A8548928708420 -Dtests.locale=en_PH -Dtests.timezone=Africa/Mogadishu -Dtests.asserts=true -Dtests.file.encoding=UTF-8
[junit4] FAILURE 0.09s | TestGeoUtils.testGeoRelations <<<
[junit4] > Throwable #1: java.lang.AssertionError: 3 incorrect hits (see above)
[junit4] > at __randomizedtesting.SeedInfo.seed([A1A8548928708420:638B403C5C87F29E]:0)
[junit4] > at org.apache.lucene.util.TestGeoUtils.testGeoRelations(TestGeoUtils.java:506)
[junit4] > at java.lang.Thread.run(Thread.java:745)
[junit4] 2> NOTE: test params are: codec=FastCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=FAST, chunkSize=25153, maxDocsPerChunk=7, blockSize=932), termVectorsFormat=CompressingTermVectorsFormat(compressionMode=FAST, chunkSize=25153, blockSize=932)), sim=ClassicSimilarity, locale=en_PH, timezone=Africa/Mogadishu
[junit4] 2> NOTE: Linux 3.13.0-61-generic amd64/Oracle Corporation 1.8.0_60 (64-bit)/cpus=8,threads=1,free=422996528,total=504889344
[junit4] 2> NOTE: All tests run in this JVM: [TestGeoUtils]
[junit4] Completed [1/1] in 0.24s, 1 test, 1 failure <<< FAILURES!
Seems like boundary cases because the actual distance vs query distance are pretty close?
ASF subversion and git services (migrated from JIRA)
Commit 1709485 from @mikemccand in branch 'dev/branches/lucene6780' https://svn.apache.org/r1709485
LUCENE-6780: disable test until we can fix pole crossing correctly
Michael McCandless (@mikemccand) (migrated from JIRA)
OK I disabled the new test case until we can fix pole crossing.
But I hit a new failure:
....I.I.......T2: id=2929 should not match but did
small=true query=GeoPointInBBoxQuery: field=point: Lower Left: [-96.30716303968933,35.304955180599926] Upper Right: [-95.78797371671509,35.55537284046483] docID=2929
lat=35.30495325717168 lon=-95.95683785493188
deleted?=false
okt 19, 2015 5:00:44 PM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
WARNING: Uncaught exception in thread: Thread[T2,5,TGRP-TestGeoPointQuery]
java.lang.AssertionError: some hits were wrong
at __randomizedtesting.SeedInfo.seed([F82368891A1F01C9]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.apache.lucene.util.BaseGeoPointTestCase$VerifyHits.test(BaseGeoPointTestCase.java:509)
at org.apache.lucene.util.BaseGeoPointTestCase$2._run(BaseGeoPointTestCase.java:697)
at org.apache.lucene.util.BaseGeoPointTestCase$2.run(BaseGeoPointTestCase.java:580)
ENOTE: reproduce with: ant test -Dtestcase=TestGeoPointQuery -Dtests.method=testRandomMedium -Dtests.seed=F82368891A1F01C9 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=sk_SK -Dtests.timezone=America/Knox_IN -Dtests.asserts=true -Dtests.file.encoding=UTF-8
....I.....NOTE: test params are: codec=Asserting(Lucene53): {id=PostingsFormat(name=Memory doPackFST= true), point=PostingsFormat(name=Asserting)}, docValues:{id=DocValuesFormat(name=Lucene50), point=DocValuesFormat(name=Lucene50)}, sim=ClassicSimilarity, locale=sk_SK, timezone=America/Knox_IN
NOTE: Linux 3.13.0-61-generic amd64/Oracle Corporation 1.8.0_60 (64-bit)/cpus=8,threads=1,free=488560232,total=514850816
NOTE: All tests run in this JVM: [TestGeoPointQuery]
Looks like a boundary case ...
Nick Knize (@nknize) (migrated from JIRA)
Updated patch:
@Nightly
to repeated adversarial testsI think this is ready and we can move on to fixing performance by removing some of these transient structures (e.g., rangeList)
Michael McCandless (@mikemccand) (migrated from JIRA)
+1 to the last patch, thanks @nknize!
ASF subversion and git services (migrated from JIRA)
Commit 1709892 from @nknize in branch 'dev/branches/lucene6780' https://svn.apache.org/r1709892
LUCENE-6780: fix quantization in random lat/lon generation, cut overly adversarial tests to @Nightly
I'm working on #7756 but struggling with test failures ...
Then I noticed that TestGeoPointQuery's test never tests on large distances, so I modified the test to sometimes do so (like TestBKDTree) and hit test failures.
Migrated from LUCENE-6780 by Michael McCandless (@mikemccand), resolved Oct 23 2015 Attachments: LUCENE-6780.patch (versions: 11), LUCENE-6780-heap-used-hack.patch Linked issues:
7835