apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.67k stars 1.03k forks source link

Integrate lat/long BKD and spatial 3d, part 2 [LUCENE-6759] #7817

Closed asfimport closed 9 years ago

asfimport commented 9 years ago

This is just a continuation of #7757, which became too big.


Migrated from LUCENE-6759 by Michael McCandless (@mikemccand), resolved Sep 02 2015 Attachments: LUCENE-6699.patch (versions: 17)

asfimport commented 9 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

Hmm.

For a start, I'd like to revise the structure of encoding and decoding so that PlanetModel doesn't propagate into the encoder methods. As I play with this I keep tripping over the dichotomy between encode and decode and all that entails. Any objection if I submit a structural patch as the first order of business?

asfimport commented 9 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

Replacing my previous patch attempt with a structural change that means decode and encode methods receive the same info, which is calculated elsewhere. This does NOT fix the failure, but @mikemccand, if you have no objection, this will make it easier to iterate for me.

asfimport commented 9 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Thanks @DaddyWri that makes sense, I'll commit that except for the +/- Integer.MAX_VALUE clipping since we are still digging on that.

asfimport commented 9 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

OK I committed that, minus a couple parts that are about debugging this test failure, and I consistently renamed max to planetMax. Let me know if I screwed up, but tests passed once!

asfimport commented 9 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

Ok, two issues.

First issue: the problem I was initially debugging was iteration 121. This problem is occurring on iteration 122. It's a different shape, and all the analysis/tests that were done are no longer pertinent. I guess that's good.

For iteration 122, the shape is a GeoPath. The problem is simply that the root cell does not contain document 1230. Neither does the xyzbounds. But the shape itself does.

Some debugging output:

   [junit4]   1> TEST: iter=122 shape=GeoPath: {planetmodel=PlanetModel.WGS84, width=0.7766715171374766(44.5), points={[[lat=-0.2751718361148076, lon=-0.7786721269011477], [lat=0.5728375851539309, lon=-1.2700115736820465]]}}
   [junit4]   1>   root cell: cell=36178 x: -1068861352 TO 2145533482, y: -2147483647 TO 66660163, z: -1855240006 TO 2086587766, splits: 0 contains 1230? false GeoArea contains 1230? false shape contains point? true maximum X > maxPlanet? false
   [junit4]   1>   doc 1230 encoding: x=2145533565 y=-84921712 z=-33945127

So, it's the x dimension that is insufficient; the x upper bound for the root cell is too small by 83.

I'll set this up in its own Geo3D test and analyze it further.

asfimport commented 9 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

Fixes the problem by increasing FUDGE_FACTOR still further.

Other changes include testing additions/improvements.

@mikemccand, this should not be controversial; commit when ready...

asfimport commented 9 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

OK I committed a test fix so we only see verbose logging for the iteration that failed.

Also I hit a new failure, GeoPath again:

[junit4:pickseed] Seed property 'tests.seed' already defined: 24B39D6EB4A89D20
   [junit4] <JUnit4> says ciao! Master seed: 24B39D6EB4A89D20
   [junit4] Executing 1 suite with 1 JVM.
   [junit4] 
   [junit4] Started J0 PID(14785@localhost).
   [junit4] Suite: org.apache.lucene.bkdtree3d.TestGeo3DPointField
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeo3DPointField -Dtests.method=testGeo3DRelations -Dtests.seed=24B39D6EB4A89D20 -Dtests.multiplier=5 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=zh -Dtests.timezone=America/Grenada -Dtests.asserts=true -Dtests.file.encoding=UTF-8
   [junit4] FAILURE 4.56s | TestGeo3DPointField.testGeo3DRelations <<<
   [junit4]    > Throwable #1: java.lang.AssertionError: invalid hits for shape=GeoPath: {planetmodel=PlanetModel.WGS84, width=0.6894050545377601(39.5), points={[[lat=-0.0788176065762948, lon=0.9431251741731624], [lat=0.510387871458147, lon=0.5327078872484678], [lat=-0.5624521609859962, lon=1.5398841746888388], [lat=-0.5025171434638661, lon=-0.5895998642788894]]}}
   [junit4]    >    at __randomizedtesting.SeedInfo.seed([24B39D6EB4A89D20:94CCE0FA3BE533BC]:0)
   [junit4]    >    at org.apache.lucene.bkdtree3d.TestGeo3DPointField.testGeo3DRelations(TestGeo3DPointField.java:708)
   [junit4]    >    at java.lang.Thread.run(Thread.java:745)
   [junit4]   2> NOTE: test params are: codec=CheapBastard, sim=DefaultSimilarity, locale=zh, timezone=America/Grenada
   [junit4]   2> NOTE: Linux 3.19.0-21-generic amd64/Oracle Corporation 1.8.0_51 (64-bit)/cpus=72,threads=1,free=372695208,total=514850816
   [junit4]   2> NOTE: All tests run in this JVM: [TestGeo3DPointField]
   [junit4] Completed [1/1] in 4.87s, 1 test, 1 failure <<< FAILURES!
   [junit4] 
   [junit4] 
   [junit4] Tests with failures:
   [junit4]   - org.apache.lucene.bkdtree3d.TestGeo3DPointField.testGeo3DRelations
asfimport commented 9 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

Was able to reproduce this with a small test case. XYZBounds again. Only this time increasing FUDGE_FACTOR by even a further factor of 5 doesn't address it, so digging deeper into the bounds computation...

FWIW, it may be worthwhile having the randomized test sometimes use a completely artificial planet model (e.g. where ab and c are rather different) in order to tease out actual elliptical geometry bugs a bit more readily. I'll look at that too.

asfimport commented 9 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

The path is a four-point one. The part of the path that contains the point is the first segment. This segment has four bounding planes and four corner points. The point in question has the following coordinates:

   [junit4]   2> Point.x = 1.0005691667476262; point.y=0.023149205146987442; point.z=0.023676292209381222

My augmented output for getting the bounds for the segment shows that it is the upper connecting plane that determines the maxX boundary for the shape:

   [junit4]   2>  upperConnectingPlane...
   [junit4]   2>     computing Z bound
   [junit4]   2>     not degenerate
   [junit4]   2>     computing X bound
   [junit4]   2>     not degenerate; B=-0.5302912804708979; C=-0.5331856607581867
   [junit4]   2>       Point = [X=1.0005682443631527, Y=0.022204817350270467, Z=0.02459563634844797]; this.evaluate(point)=0.0; normalizedXPlane.evaluate(point)=4.163336342344337E-17
   [junit4]   2>     computing Y bound
   [junit4]   2>     not degenerate

The point of maximum excursion's X value, though, is X=1.0005682443631527. The actual X value of the point we're checking is Point.x = 1.0005691667476262. So that's the disagreement. The next step is to see how that difference arises. Specifically, the point found is on the upperConnectingPlane (eval = 0.0), and on the plane I have constructed that should represent a perpendicular to the upperConnectingPlane going through the point of maximum excursion. I'll have to verify that that is indeed the case.

asfimport commented 9 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

The computation that determines the perpendicular plane that is intersected with the original plane to find the X or Y bounds is as follows:

(1) Create a perpendicular plane which includes the X axis (or Y axis, depending) (2) Evaluate that plane at the two Z points that represent the maximum excursions in Z for the plane intersecting the planet (3) Take the average of the evaluations, and construct a new plane that is parallel to the first perpendicular plane but offset by that amount

In this case, the following is produced:

   [junit4]   2> zPoint: [X=0.09847149376598374, Y=-0.07921875599763584, Z=-0.9897798734262737]
   [junit4]   2> zPoint: [X=0.7398138828430078, Y=-0.5951685429682101, Z=0.31625095037048134]
...
   [junit4]   2>     computing X bound
   [junit4]   2>     not degenerate; B=-0.5302912804708979; C=-0.5331856607581867
   [junit4]   2>     originPlane.evaluate(zpoint)=-0.6418043015460514
   [junit4]   2>     originPlane.evaluate(zpoint)=0.6450052856365084

So, the D value being computed is the average of two fairly large numbers, which are largely canceling each other out. Unfortunately, however, that kind of floating-point calculation leads to a fairly high error value, which I believe to be the source of the problem we're seeing.

I will ponder if there's a better way to compute what we need here.

asfimport commented 9 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

Patch increasing FUDGE_FACTOR still further. Analysis indicates that we just have a rather large error value to deal with given how XYZ bounds computation works. Looking for a better way to do the computation, but until then, this is the only solution.

asfimport commented 9 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Thanks @DaddyWri, I committed that.

We could alternatively remove this optimization? I.e. descend from the entire planet every time, instead of starting from the bbox?

asfimport commented 9 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

@mikemccand Even though the bounds calculation for X and Y has a "large" error, it's still on the order of 1 meter on the planet surface. So unless we simply cannot find a reasonable error bound, we should leave it as is.

asfimport commented 9 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I think this is finally ready! I'm attaching a clean patch against latest trunk ... I'll beast a bit more and the commit soon if no new failures.

We can open followon issues for e.g. randomizing the PlanetModel.

asfimport commented 9 years ago

David Smiley (@dsmiley) (migrated from JIRA)

Congrats guys!