apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.58k stars 1.01k forks source link

fix random number generation used for spatial tests [LUCENE-7185] #8240

Open asfimport opened 8 years ago

asfimport commented 8 years ago

The current method is not very good for testing.


Migrated from LUCENE-7185 by Robert Muir (@rmuir), updated Apr 18 2016 Attachments: LUCENE-7185_polygon.patch, LUCENE-7185_sorting.patch, LUCENE-7185.patch, newRandom.png, oldRandom.png

asfimport commented 8 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Here is what im thinking. Lots of tests fail. Some problems may be test bugs, other real bugs.

I fixed the LatLonGrid but there are many other failures.

asfimport commented 8 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

+1, this is great :)

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit b59ace99e45969a8d81be4639bfb2695681636eb in lucene-solr's branch refs/heads/master from @rmuir https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b59ace9

LUCENE-7185: make an empty grid the simple way

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit e75db9476764b9a6a426686a58e18f38a26883a5 in lucene-solr's branch refs/heads/branch_6x from @rmuir https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e75db94

LUCENE-7185: make an empty grid the simple way

asfimport commented 8 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I'm gonna try to fight my way thru the various test fails, some are pretty crazy.

by frequently choosing certain edge values it also means we find bugs or test bugs with tie-break handling in sorting, rectangles that really look more like lines/single points, you name it.

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit 2965ac2ca18e2019fb9f03170b9bcf98162a9d21 in lucene-solr's branch refs/heads/master from @rmuir https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2965ac2

LUCENE-7185: fix edge case bug in test logic (min=max=180), don't leak Directory for edge cases!

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit cebac848a999db0b341d717c8edd39ecc9184671 in lucene-solr's branch refs/heads/branch_6x from @rmuir https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cebac84

LUCENE-7185: fix edge case bug in test logic (min=max=180), don't leak Directory for edge cases!

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit e0c507a40012a1148da65b81d722bd9b29ec9d8e in lucene-solr's branch refs/heads/master from @rmuir https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e0c507a

LUCENE-7185: fix edge case bugs in LatLonPoint bounding box query

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit 6d537d2bad64a27927aa7c67c600dee97f79aa2b in lucene-solr's branch refs/heads/branch_6x from @rmuir https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6d537d2

LUCENE-7185: fix edge case bugs in LatLonPoint bounding box query

asfimport commented 8 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

+      // may trigger divide by zero
+      return 0.0;

Given that ((Double) +0.0).equals/compareTo(-0.0) equals false, even though unboxed comparison obviously equals true, I'd add -0.0 to the list of surprises as well.

asfimport commented 8 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Here is a patch for the sort bug. The bug is actually in SloppyMath.haversinSortKey. It sometimes returns values that are different, yet when followed through with the rest of the haversin equation will collapse to the same value due to subsequent rounding.

This means Double.compare(sortKey(...), sortKey(...)) becomes inconsistent with Double.compare(fullHaversin(...), fullHaversin(...)) which is the whole point, and causes the tie-breaker test failure.

The fix is to clobber some bits: in the worst case for huge distances around the earth it increases error from 0.1cm to 20cm, but "typical" distances (e.g. <1000km) are not impacted and still within e.g. 0.01mm error. Our docs for error were really off here too: i improved the testing for that. The fix does not hurt performance.

Now all tests pass in sandbox/, but I have not yet looked at geopoint failures. One of them is kinda scary: {noformat} [junit4] Suite: org.apache.lucene.spatial.geopoint.search.TestGeoPointQuery [junit4] IGNOR/A 0.01s J1 | TestGeoPointQuery.testRandomBig [junit4] > Assumption #1: 'nightly' test group is disabled (@Nightly()) [junit4] IGNOR/A 0.00s J1 | TestGeoPointQuery.testRandomDistanceHuge [junit4] > Assumption #1: 'nightly' test group is disabled (@Nightly()) [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeoPointQuery -Dtests.method=testAllLonEqual -Dtests.seed=4ABB96AB44F4796E -Dtests.locale=id-ID -Dtests.timezone=Pacific/Fakaofo -Dtests.asserts=true -Dtests.file.encoding=US-ASCII [junit4] ERROR 0.35s J1 | TestGeoPointQuery.testAllLonEqual <<< [junit4] > Throwable #1: java.lang.IllegalArgumentException: Illegal shift value, must be 32..63; got shift=0

[junit4] > at org.apache.lucene.spatial.util.GeoEncodingUtils.geoCodedToPrefixCodedBytes(GeoEncodingUtils.java:109) [junit4] > at org.apache.lucene.spatial.util.GeoEncodingUtils.geoCodedToPrefixCoded(GeoEncodingUtils.java:89) [junit4] > at org.apache.lucene.spatial.geopoint.search.GeoPointPrefixTermsEnum$Range.fillBytesRef(GeoPointPrefixTermsEnum.java:236) [junit4] > at org.apache.lucene.spatial.geopoint.search.GeoPointTermsEnum.nextRange(GeoPointTermsEnum.java:71) [junit4] > at org.apache.lucene.spatial.geopoint.search.GeoPointPrefixTermsEnum.nextRange(GeoPointPrefixTermsEnum.java:171) [junit4] > at org.apache.lucene.spatial.geopoint.search.GeoPointPrefixTermsEnum.nextSeekTerm(GeoPointPrefixTermsEnum.java:190) [junit4] > at org.apache.lucene.index.FilteredTermsEnum.next(FilteredTermsEnum.java:212) [junit4] > at org.apache.lucene.spatial.geopoint.search.GeoPointTermQueryConstantScoreWrapper$1.scorer(GeoPointTermQueryConstantScoreWrapper.java:110) [junit4] > at org.apache.lucene.search.Weight.bulkScorer(Weight.java:135) [junit4] > at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.bulkScorer(LRUQueryCache.java:644) [junit4] > at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:68) [junit4] > at org.apache.lucene.search.BooleanWeight.optionalBulkScorer(BooleanWeight.java:231) [junit4] > at org.apache.lucene.search.BooleanWeight.booleanScorer(BooleanWeight.java:297) [junit4] > at org.apache.lucene.search.BooleanWeight.bulkScorer(BooleanWeight.java:364) [junit4] > at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.bulkScorer(LRUQueryCache.java:644) [junit4] > at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:68) [junit4] > at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:68) [junit4] > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:666) [junit4] > at org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:91) [junit4] > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:473) [junit4] > at org.apache.lucene.spatial.util.BaseGeoPointTestCase.verifyRandomRectangles(BaseGeoPointTestCase.java:835) [junit4] > at org.apache.lucene.spatial.util.BaseGeoPointTestCase.verify(BaseGeoPointTestCase.java:763) [junit4] > at org.apache.lucene.spatial.util.BaseGeoPointTestCase.testAllLonEqual(BaseGeoPointTestCase.java:495) {noformat}

asfimport commented 8 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

+1 to the sorting patch!

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit 05d62a357711d1e4e850a5d2fb7336bf0a7acf24 in lucene-solr's branch refs/heads/master from @rmuir https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=05d62a3

LUCENE-7185: fix tie-breaker sort bug

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit 01867a5b318d6509731feea118b3f82764310380 in lucene-solr's branch refs/heads/branch_6x from @rmuir https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=01867a5

LUCENE-7185: fix tie-breaker sort bug

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit 901a3af301f81331459cddb87e4fbf1a4437303b in lucene-solr's branch refs/heads/master from @rmuir https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=901a3af

LUCENE-7185: fix random number generation used for spatial tests.

Note that GeoPoint tests are still on the old RNG as we haven't get made them happy.

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit b21218e79f2b431b3311a0a07dfbcdb7ae48f1ab in lucene-solr's branch refs/heads/branch_6x from @rmuir https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b21218e

LUCENE-7185: fix random number generation used for spatial tests.

Note that GeoPoint tests are still on the old RNG as we haven't get made them happy.

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit cd673ebec93cbdf37d14b3984552cec4388cea0f in lucene-solr's branch refs/heads/master from @rmuir https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cd673eb

LUCENE-7185: add proper tests for grid bugs found here, and fix related bugs still lurking

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit acf0a57940ee8a625a3d5c48bfa170e5e758996b in lucene-solr's branch refs/heads/branch_6x from @rmuir https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=acf0a57

LUCENE-7185: add proper tests for grid bugs found here, and fix related bugs still lurking

asfimport commented 8 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Attached is some improvements to the polygon/box testing, really its just about the points we pick to test. I haven't yet improved the actual polygons being generated.

Here is a visual of what random points and boxes look like with the old generator. You can see it essentially only targets vertices for points and boxes and so these are usually not very interesting. oldRandom.png

Here is the new one: it tries better to find edge cases. Specifically we attack edges themselves, the bounding box, diagonals between vertices, and so on. I improved boxes as well, they definitely still could use work but should be more effective. newRandom.png

I will beast the tests to try to reduce any noise. I think its ok given what our tests are doing (and the very simple stuff we are doing today with polys).

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit c135e9a7d6aedc317879a716716224b3b19c91a1 in lucene-solr's branch refs/heads/master from @rmuir https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c135e9a

LUCENE-7185: improve random test point/box generation for spatial tests

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit 2138bc05365e362dd5aa8df4224fd853e981de2f in lucene-solr's branch refs/heads/branch_6x from @rmuir https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2138bc0

LUCENE-7185: improve random test point/box generation for spatial tests

asfimport commented 8 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

Cool visualization.

asfimport commented 8 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

+1, this is fabulous!!

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit cc099d8fc60a7693417754f9b3a3f51f0952f9a3 in lucene-solr's branch refs/heads/master from @rmuir https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cc099d8

LUCENE-7185: handle underflow

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit 045f2cad6f7808f11a4f0626ec497a077eac3220 in lucene-solr's branch refs/heads/branch_6x from @rmuir https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=045f2ca

LUCENE-7185: handle underflow

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit b839325740f9487bf30c74b649993c7e400a2aaa in lucene-solr's branch refs/heads/master from @rmuir https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b839325

LUCENE-7185: fix buggy worst-case error and add test for absurd distances

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit cc463bf4f28b07b05a125a6b7105a9997ec8e82e in lucene-solr's branch refs/heads/branch_6x from @rmuir https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cc463bf

LUCENE-7185: fix buggy worst-case error and add test for absurd distances