apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.62k stars 1.02k forks source link

Remove geo3d test leniency [LUCENE-7168] #8223

Closed asfimport closed 8 years ago

asfimport commented 8 years ago

Today the test hides possible failures by leniently handling quantization issues.

We should fix it to do what geo2d tests now do: pre-quantized indexed points, but don't quantize query shapes.


Migrated from LUCENE-7168 by Michael McCandless (@mikemccand), resolved Apr 06 2016 Attachments: LUCENE-7168.patch (versions: 7)

asfimport commented 8 years ago

Robert Muir (@rmuir) (migrated from JIRA)

It seems also that the current test design (testing unquantized values, and trying to "correct" for quantization to satisfy that) may cause needless complexity in Geo3D?

Because the query then tries to "correct for quantization", it computes a min/max in all 3 dimensions, and then passes these 6 values to a solid factory that returns one of 8 possible xyzsolid implementations that try to "correct" for it.

asfimport commented 8 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

Hi Robert,

The reason for the 8 implementations is because there are degeneracy conditions. Specifically, when a dimension is immeasurably small, we still need to determine set membership but only when it is on the plane that represents that dimension. it's dicey to use two planes that are right on top of each other for determining "in set" in that case. There is a degeneracy condition for a single point as well.

This is independent of the BKD implementation's quantization effects; Geo3D would want to do this anyway. There are similar issues with bounding boxes that have degeneracies of various kinds.

asfimport commented 8 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I don't see how it can be immeasurably small. We use a 32-bit quantization so we know exactly how small things can be?

asfimport commented 8 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

"Immeasurably" means within the resolution of the mathematics underlying planes, given the numerical resolution of a double. It is summarized with the constant Vector.MINIMUM_RESOLUTION.

asfimport commented 8 years ago

Robert Muir (@rmuir) (migrated from JIRA)

But we do not have the numeric resolution of a double. It is less, only 32 bits.

asfimport commented 8 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

That's an implementation detail. All of the stuff in spatial3D/geom knows nothing about that and operates from strict principles of the math alone.

asfimport commented 8 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Its an important implementation detail. If we can make simplifications around things like this, then I think we should.

We can't just let quantization only make things more complicated.

asfimport commented 8 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

Handling geometric degeneracy conditions properly, I agree, should happen regardless of what is being done elsewhere in Lucene.

asfimport commented 8 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I think this is the wrong way to look at it? If some of these situations are merely theoretical and cannot actually happen in lucene, then why do we need to support them?

asfimport commented 8 years ago

Robert Muir (@rmuir) (migrated from JIRA)

And I have not tested the numbers involved either: I am just looking to simplify.

asfimport commented 8 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

While XYZSolids are used by Lucene for BKD trees, I cannot say whether or not they would never be used by anyone else. The functionality is general, after all. In fact, the functionality was introduced because we had issues without it from the BKD tree implementation. Perhaps those problems are fixed but whether degeneracy support for that use case is needed still depends strongly on the relative values of the "known" minimum 32-bit resolution of the tree and the actual value of Vector.MINIMUM_RESOLUTION. So at the moment I'm not even convinced that the BKD implementation doesn't need degeneracy support.

I already made all the variants of XYZSolids package-private and lucene.internal. Is this enough to hide this complication from users?

asfimport commented 8 years ago

Robert Muir (@rmuir) (migrated from JIRA)

While XYZSolids are used by Lucene for BKD trees, I cannot say whether or not they would never be used by anyone else. The functionality is general, after all.

Right but I think we should try to simplify and reduce to the subset of logic that lucene needs to index and run its queries. We sorta have to do this, otherwise everything is completely open-ended ("well someone MIGHT have a use case") and we can't do APIs that way.

I already made all the variants of XYZSolids package-private and lucene.internal. Is this enough to hide this complication from users?

This is a good step and will at least defer me from trying too hard to simplify this stuff for now. Because it at least hides all this complexity from the user and that is the worst of the worst. Having some hairy stuff that is package-private is much less evil.

asfimport commented 8 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

Right but I think we should try to simplify and reduce to the subset of logic that lucene needs to index and run its queries. We sorta have to do this, otherwise everything is completely open-ended ("well someone MIGHT have a use case") and we can't do APIs that way.

Yes, but it's just another shape. Presumably you would not want to peel all of the shapes that aren't explicitly used by lucene-core out of the package? I'd make the case for it on that basis alone.

If the "shape API footprint" is worrying you, we should discuss what shapes you think people will want to search for and what they won't, and then come up with a place for the shapes you don't like to live. I'm less willing to judge that up front, however.

asfimport commented 8 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I think (ultimately) we should have Geo3DPoint.java and whatever it uses in its newXXXQuery() methods is all we need, nothing more :)

Remember we can always choose to expose more advanced functionality at any time. We can revive things from git and so on. We can try to repackage things (but we must be careful this doesnt cause more harm than good) to be better.

We can add new methods like the newPathQuery (which seems useful and general) that are things Geo3D can do that nobody else can do: stuff like this is great.

We can also try to start rounding out an expert API that does more stuff: make Custom3DPoint or Advanced3DPoint or whatever we want to call it that is even more expert: e.g. custom planet models/distance metrics/crazy shapes. I just think we should restrict ourselves to real use-cases there too to keep things bounded.

I do think we have to err towards simplicity rather than worrying about experts. But we can still make expert stuff possible.

asfimport commented 8 years ago

Nick Knize (@nknize) (migrated from JIRA)

e.g. custom planet models/distance metrics/crazy shapes

Projections would be useful - and BKD is the right structure to handle them. We need to be careful where we put a lot of this logic though. There are other Apache projects that already do this kind of stuff very well (and maintain the logic along with the EPSG database). I'm not sure we want to necessarily duplicate this work. Support for some of this more "expert capability" makes more sense in spatial-extras where we can bring in third-party support (like SIS, ESRI).

asfimport commented 8 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Here's a patch to remove query-time quantizing, threads, and the "small" boolean. Tests seem to survive beasting so far ...

asfimport commented 8 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Another patch, just improving the failure message, and beasting uncovered a failure!

   [junit4] Suite: org.apache.lucene.spatial3d.TestGeo3DPoint
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeo3DPoint -Dtests.method=testRandomMedium -Dtests.seed=1B44C9C6ED816CF9 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=pl-PL -Dtests.timezone=Asia/Damascus -Dtests.asserts=true -Dtests.file.encoding=UTF-8
   [junit4] FAILURE 0.68s | TestGeo3DPoint.testRandomMedium <<<
   [junit4]    > Throwable #1: java.lang.AssertionError: FAIL: id=147 should have matched but did not
   [junit4]    >   shape=GeoPath: {planetmodel=PlanetModel.WGS84, width=0.9250245035569946(53.0), points={[[lat=-0.31202581053256323, lon=-2.3769151553645593], [lat=-0.04559091474230697, lon=0.3560334442429531], [lat=-0.6470509953978505, lon=1.0469362861181948], [lat=-0.685120355322366, lon=1.6043555643894918], [lat=-1.568917110871099, lon=1.008031992795193]]}}
   [junit4]    >   lat=2.3507843431772453 lon=127.59305012508766
   [junit4]    >   point=[lat=0.04102892679277523, lon=2.2269188273449423]
   [junit4]    >   docID=133 deleted?=false
   [junit4]    >   query=PointInGeo3DShapeQuery: field=point: Shape: GeoPath: {planetmodel=PlanetModel.WGS84, width=0.9250245035569946(53.0), points={[[lat=-0.31202581053256323, lon=-2.3769151553645593], [lat=-0.04559091474230697, lon=0.3560334442429531], [lat=-0.6470509953978505, lon=1.0469362861181948], [lat=-0.685120355322366, lon=1.6043555643894918], [lat=-1.568917110871099, lon=1.008031992795193]]}}
   [junit4]    >    at __randomizedtesting.SeedInfo.seed([1B44C9C6ED816CF9:A69AFE6EACE40F9F]:0)
   [junit4]    >    at org.apache.lucene.spatial3d.TestGeo3DPoint.verify(TestGeo3DPoint.java:741)
   [junit4]    >    at org.apache.lucene.spatial3d.TestGeo3DPoint.doTestRandom(TestGeo3DPoint.java:529)
   [junit4]    >    at org.apache.lucene.spatial3d.TestGeo3DPoint.testRandomMedium(TestGeo3DPoint.java:456)
   [junit4]    >    at java.lang.Thread.run(Thread.java:745)
   [junit4]   2> NOTE: test params are: codec=Asserting(Lucene60): {id=PostingsFormat(name=Asserting)}, docValues:{id=DocValuesFormat(name=Lucene54)}, maxPointsInLeafNode=1169, maxMBSortInHeap=6.9707915551209005, sim=ClassicSimilarity, locale=pl-PL, timezone=Asia/Damascus
   [junit4]   2> NOTE: Linux 3.13.0-71-generic amd64/Oracle Corporation 1.8.0_60 (64-bit)/cpus=8,threads=1,free=456488064,total=504889344
   [junit4]   2> NOTE: All tests run in this JVM: [TestGeo3DPoint]
   [junit4] Completed [1/1 (1!)] in 0.84s, 1 test, 1 failure <<< FAILURES!
asfimport commented 8 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

@mikemccand: What is the x,y,z of the quantized point?

asfimport commented 8 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

@mikemccand: Looking at the change in the test code, all you did is remove the quantization from the check. I think what you need to do is the following:

The test as you have it now is guaranteed to find spurious failures that are solely due to quantization effects.

asfimport commented 8 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Looking at the change in the test code, all you did is remove the quantization from the check.

Wait: I also made a change to pre-quantize all points up front, at the top of the verify method, so that we only index quantized points.

At search time, I do not quantize the random lat/lon used to generate shapes ...

asfimport commented 8 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

Right, but when you create the GeoPoint to test with, you use a lat/lon. You really should be using the (x,y,z) values from the pre-quantization.

asfimport commented 8 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

OK I'll fixup the patch to do the quantizing in x/y/z space instead of lat/lon space.

asfimport commented 8 years ago

Robert Muir (@rmuir) (migrated from JIRA)

we can start by just testing that the query's logic within x/y/z space is correct (e.g. consistent with what would happen if it just returned CELL_CROSSES_QUERY from compare() for every call)?

This at least tells us that compare() is consistent with visit().

Then separately, we test that 2D -> 3D conversion works as we expect (these can be unit tests for GeoPoint.java). And separately test that visit() for a single point really does what we think it should (maybe tests against specific 3D shapes, etc)

Unfortunately this means we can't reuse all our 2D testing infra for Geo3D, but I think it would be complicated if we tried?

asfimport commented 8 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Work-in-progress, dirty patch, fixing (I think) the test quantization.

I also attempted to fix geo3d's encode/decode to look like LatLonPoint, and added some encode/decode tests, however TestGeo3DPoint.testEncodeDecodeIsStable is failing!

I also moved some "called only by tests" methods into the test cases, and "called only by the one query" methods into the query.

asfimport commented 8 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

What are the values that fail? It should be straightforward to debug why they fail if we know some examples.

asfimport commented 8 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

OK, some progress (new patch).

I created a silly method to find a "good" double value for "quantizing to int" purposes:

  /** Returns a double value >= x such that if you multiply that value by an int, and then
   *  divide it by that int again, you get precisely the same value back */

If we use such a value, then the encode/decode is stable. E.g. for WGS84, the planetMax is 1.0011188539924791, and the nextSafeDouble after that value is 1.0011191368103027.

But there is still one problem that I'm trying to work out, which is that if you encode the min value (-planetMax), and then decode that, you get back a value LESS than -planetMax, because of the flooring we do, and it's not guaranteed the planetMax would "be" at a floor boundary. Somehow, magically, LatLonPoint's min values seem to work ;)

Other tests are passing, except with this patch I hit this failure on beasting:

   [junit4] Started J0 PID(5288@localhost).
   [junit4] Suite: org.apache.lucene.spatial3d.TestGeo3DPoint
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeo3DPoint -Dtests.method=testRandomBig -Dtests.seed=3AD1F04EDC75F5CC -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=ru-RU -Dtests.timezone=America/Grand_Turk -Dtests.asserts=true -Dtests.file.encoding=UTF-8
   [junit4] FAILURE 5.05s | TestGeo3DPoint.testRandomBig <<<
   [junit4]    > Throwable #1: java.lang.AssertionError: FAIL: id=541300 should have matched but did not
   [junit4]    >   shape=GeoStandardPath: {planetmodel=PlanetModel.WGS84, width=0.3665191429188092(21.0), points={[[lat=-0.9764909432864418, lon=1.5923866750547349], [lat=-0.044905310521497856, lon=0.543041341355748], [lat=-1.50783500438853, lon=-0.8877147957518042]]}}
   [junit4]    >   point=[X=0.8653002868649471, Y=0.50134342478497, Z=0.046203414829601996]
   [junit4]    >   docID=539356 deleted?=false
   [junit4]    >   query=PointInGeo3DShapeQuery: field=point: Shape: GeoStandardPath: {planetmodel=PlanetModel.WGS84, width=0.3665191429188092(21.0), points={[[lat=-0.9764909432864418, lon=1.5923866750547349], [lat=-0.044905310521497856, lon=0.543041341355748], [lat=-1.50783500438853, lon=-0.8877147957518042]]}}
   [junit4]    >    at __randomizedtesting.SeedInfo.seed([3AD1F04EDC75F5CC:BD868DC14D2C894C]:0)
   [junit4]    >    at org.apache.lucene.spatial3d.TestGeo3DPoint.verify(TestGeo3DPoint.java:730)
   [junit4]    >    at org.apache.lucene.spatial3d.TestGeo3DPoint.doTestRandom(TestGeo3DPoint.java:530)
   [junit4]    >    at org.apache.lucene.spatial3d.TestGeo3DPoint.testRandomBig(TestGeo3DPoint.java:462)
   [junit4]    >    at java.lang.Thread.run(Thread.java:745)
   [junit4]   2> NOTE: leaving temporary files on disk at: /l/geo3dquant/lucene/build/spatial3d/test/J0/temp/lucene.spatial3d.TestGeo3DPoint_3AD1F04EDC75F5CC-001
   [junit4]   2> NOTE: test params are: codec=Asserting(Lucene60): {id=TestBloomFilteredLucenePostings(BloomFilteringPostingsFormat(Lucene50(blocksize=128)))}, docValues:{id=DocValuesFormat(name=Asserting)}, maxPointsInLeafNode=357, maxMBSortInHeap=5.860921451607786, sim=ClassicSimilarity, locale=ru-RU, timezone=America/Grand_Turk
   [junit4]   2> NOTE: Linux 3.13.0-71-generic amd64/Oracle Corporation 1.8.0_60 (64-bit)/cpus=8,threads=1,free=407941736,total=492306432
   [junit4]   2> NOTE: All tests run in this JVM: [TestGeo3DPoint]
   [junit4] Completed [1/1 (1!)] in 5.23s, 1 test, 1 failure <<< FAILURES!

I haven't dug into it yet ...

asfimport commented 8 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

OK, another patch, addressing all nocommits. I think it's nearly ready... I had to an "if" to decode to never go below -planetMax.

However the above failure still happens ...

So I've improved this test so that on failure, it runs a crazy (test only) IntersectVisitor that "explains" what happened to this one hit. For this failure, it now produces output like this:

   [junit4] Started J0 PID(11250@localhost).
   [junit4] Suite: org.apache.lucene.spatial3d.TestGeo3DPoint
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeo3DPoint -Dtests.method=testRandomBig -Dtests.seed=3AD1F04EDC75F5CC -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=ru-RU -Dtests.timezone=America/Grand_Turk -Dtests.asserts=true -Dtests.file.encoding=UTF-8
   [junit4] FAILURE 5.47s | TestGeo3DPoint.testRandomBig <<<
   [junit4]    > Throwable #1: java.lang.AssertionError: FAIL: id=541300 should have matched but did not
   [junit4]    >   shape=GeoStandardCircle: {planetmodel=PlanetModel.WGS84, center=[lat=-0.8971654677124566, lon=-0.3398482030102755], radius=1.4775317506492547(84.65633340877824)}
   [junit4]    >   point=[X=0.8653002868649471, Y=0.50134342478497, Z=0.046203414829601996]
   [junit4]    >   docID=538760 deleted?=false
   [junit4]    >   query=PointInGeo3DShapeQuery: field=point: Shape: GeoStandardCircle: {planetmodel=PlanetModel.WGS84, center=[lat=-0.8971654677124566, lon=-0.3398482030102755], radius=1.4775317506492547(84.65633340877824)}
   [junit4]    >   explanation:
   [junit4]    >     target is in leaf _2u(7.0.0):c1 of full reader StandardDirectoryReader(segments:210:nrt _12(7.0.0):c198337/1995:delGen=1 _24(7.0.0):c199685/601:delGen=1 _23(7.0.0):c5413/17:delGen=2 _25(7.0.0):c5413/12:delGen=1 _26(7.0.0):c5413/10:delGen=1 _27(7.0.0):c5413/10:delGen=1 _28(7.0.0):c5413/17:delGen=2 _29(7.0.0):c5413/8:delGen=2 _2a(7.0.0):c5413/12:delGen=1 _2b(7.0.0):c5413/14:delGen=1 _2c(7.0.0):c5413/4:delGen=1 _2d(7.0.0):c5413/13:delGen=2 _2e(7.0.0):c5413/12:delGen=2 _2f(7.0.0):c5413/15:delGen=2 _2g(7.0.0):c5413/10:delGen=2 _2h(7.0.0):c5413/9:delGen=2 _2i(7.0.0):c5413/6:delGen=1 _2j(7.0.0):c5413/9:delGen=1 _2k(7.0.0):c5413/8:delGen=1 _2l(7.0.0):c5413/5:delGen=1 _2m(7.0.0):c5413/3:delGen=1 _2n(7.0.0):c5413/4:delGen=1 _2o(7.0.0):c5413 _2p(7.0.0):c5413/5:delGen=1 _2q(7.0.0):c5413/1:delGen=1 _2r(7.0.0):c5413 _2s(7.0.0):c5413 _2t(7.0.0):c5413 _2u(7.0.0):c1)
   [junit4]    >     full BKD path to target doc:
   [junit4]    >       Cell(x=0.8653002866318559 TO 0.8653002870980383 y=0.5013434245518787 TO 0.5013434250180612 z=0.04620341459651078 TO 0.04620341506269321)
   [junit4]    >     on cell Cell(x=0.8653002866318559 TO 0.8653002870980383 y=0.5013434245518787 TO 0.5013434250180612 z=0.04620341459651078 TO 0.04620341506269321), wrapped visitor returned CELL_OUTSIDE_QUERY
   [junit4]    >    at __randomizedtesting.SeedInfo.seed([3AD1F04EDC75F5CC:BD868DC14D2C894C]:0)
   [junit4]    >    at org.apache.lucene.spatial3d.TestGeo3DPoint.verify(TestGeo3DPoint.java:741)
   [junit4]    >    at org.apache.lucene.spatial3d.TestGeo3DPoint.doTestRandom(TestGeo3DPoint.java:537)
   [junit4]    >    at org.apache.lucene.spatial3d.TestGeo3DPoint.testRandomBig(TestGeo3DPoint.java:469)
   [junit4]    >    at java.lang.Thread.run(Thread.java:745)
   [junit4]   2> NOTE: leaving temporary files on disk at: /l/geo3dquant/lucene/build/spatial3d/test/J0/temp/lucene.spatial3d.TestGeo3DPoint_3AD1F04EDC75F5CC-026
   [junit4]   2> NOTE: test params are: codec=Asserting(Lucene60): {id=TestBloomFilteredLucenePostings(BloomFilteringPostingsFormat(Lucene50(blocksize=128)))}, docValues:{id=DocValuesFormat(name=Asserting)}, maxPointsInLeafNode=357, maxMBSortInHeap=5.860921451607786, sim=ClassicSimilarity, locale=ru-RU, timezone=America/Grand_Turk
   [junit4]   2> NOTE: Linux 3.13.0-71-generic amd64/Oracle Corporation 1.8.0_60 (64-bit)/cpus=8,threads=1,free=410208728,total=495452160
   [junit4]   2> NOTE: All tests run in this JVM: [TestGeo3DPoint]
   [junit4] Completed [1/1 (1!)] in 5.63s, 1 test, 1 failure <<< FAILURES!

See the new "explanation" part. Anyway, I was then able to turn those BKD traversal details into a standalone test, that does fail:

  public void testCuriousFailure() throws Exception {
    GeoShape shape = GeoCircleFactory.makeGeoCircle(PlanetModel.WGS84, -0.8971654677124566, -0.3398482030102755, 1.4775317506492547);
    GeoPoint point = new GeoPoint(0.8653002868649471, 0.50134342478497, 0.046203414829601996);

    // point is inside our circle shape:
    assertTrue(shape.isWithin(point));

    double xMin = 0.8653002866318559;
    double xMax = 0.8653002870980383;
    double yMin = 0.5013434245518787;
    double yMax = 0.5013434250180612;
    double zMin = 0.04620341459651078;
    double zMax = 0.04620341506269321;
    GeoArea xyzSolid = GeoAreaFactory.makeGeoArea(PlanetModel.WGS84, xMin, xMax, yMin, yMax, zMin, zMax);

    // point is also inside our wee tiny box:
    assertTrue(xyzSolid.isWithin(point));

    assertTrue(xyzSolid.getRelationship(shape) != GeoArea.DISJOINT);
  }

It's an interesting failure ... the segment has only one document, so the BKD tree has a single leaf block with just that one value, and so its bbox is really really tiny. Maybe the tiny bbox and the gigantic circle tickle a geo3d bug?

asfimport commented 8 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

OK that test also fails on clean trunk checkout, so I don't think it should block pushing this change ... I'll @Ignore it for now.

asfimport commented 8 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

@mikemccand Yeah, please go ahead and push it, and I'll have a more detailed look.

I like the forensics; this will be very helpful.

I see your logic for the little test and yes it does look to me like there must be a bug of some sort. I'll chase it down.

asfimport commented 8 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

@mikemccand: The problem is that the "wee little box" doesn't actually seem to intersect with the surface of the world at all. So the point can be in the box, and it can be within the shape (because that's described by planes), but since it's off the surface of the world we don't detect these as intersecting anywhere on the surface. We need the surface points of intersection in order to evaluate the kind of intersection that is present.

So if my analysis is correct, this isn't a bug, per se, since there's really no intersection. But it's a problem, no question.

Do you have a concept of a minimum-sized box? That may be the way to go. The minimum would want to be twice the size of the largest possible numerical rounding delta, or something like that. If a box gets to be that size and still overlaps then you'd have to check the remaining individual items against the shape. Let me think it through, though, to be sure that's the right approach.

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit c2289de3c7100619b9476e4ec92ad6d4e89becc7 in lucene-solr's branch refs/heads/master from Mike McCandless https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c2289de

LUCENE-7168: improve encode and quantization testing for geo3d

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit 6868a8cd74d2d21dd679dfbb867d0823253ac2a5 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6868a8c

LUCENE-7168: improve encode and quantization testing for geo3d

asfimport commented 8 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

@mikemccand Thinking it through further, here are some more ways we could proceed.

(1) Since the problem is that the numerically approximated point is not actually on the surface, I can provide a "projection" method, which will generate a point on the surface given an (x,y,z) that is not quite on the surface. This has some cost because it will involve a sqrt operation per point so projected, but it is probably acceptable. The question is, does it help? What would we do differently in BKD?

(2) I can, maybe in conjunction with (1) above, make geo3d shapes no longer find any point that is not on the surface to be "within" any surface shape. This is of trivial cost (but requires changes to a number of classes). That will make the relationships not be confusing since points not on the surface will be excluded from the search, but it might mean that nothing gets returned at all from BKD by design. :-P

(3) I could add a method that would allow you to determine if a BKD cell intersected the surface or not. This, however, is maybe not a good solution, because cells can intersect on two sides and I can foresee issues when one side just doesn't make it etc. So maybe this won't help.

(4) When you stop to think about it, a quantized BKD point actually represents a little cell; it represents a range of possible values in x, y, and z. If we knew what that range was, maybe we could compute intersection with each BKD cell. This is a more robust way of looking at the "minimum BKD cell size" idea. I could provide a construct, e.g. GeoFuzzyPoint, which would express this. But we'd still need to figure out what operations you would need to be able to do with a fuzzy point in order to not have problems arise. Would we need new "isWithin" methods? Or just "getRelationship()" support between such fuzzy points and XYZSolids? Or what, exactly? Depending on the answer, the fuzzy point might be very similar in logic to the XYZSolid, so the cost of creating one would be similar (i.e. not trivially cheap).

Let's discuss possible solutions using these tools – if not this morning then maybe at lunch. :-)

asfimport commented 8 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

@mikemccand: I realized suddenly how to do this properly.

The trick is to line up the BKD tree cells that it descends through with the "fuzzy" cells from the point resolution. So you never would generate an XYZSolid that did not correspond exactly to an integral number of "fuzzy" point cells. This implies that you also could not have a BKD cell that was smaller than a fuzzy point cell, and thus the situation we're trying to address could never arise.

To clarify, imagine the space that you are searching as consisting of a large number of small cubes. Each cube has an integer address, e.g. (xi, yi, zi). You can put an actual GeoPoint on the world surface into an individual cube via quantization, so you then get to describe it by (xi, yi, zi). When we do the BKD descent, the way we must describe the solid areas we are searching is not by ranges of (x,y,z), but by ranges of (xi,yi,zi). Then you can convert those ranges trivially into (x,y,z) ranges and create the XYZSolid objects you use to relate with the shape. This guarantees you will not ever create an XYZSolid that contains a point which does not intersect with the world, since the original point had to have been on the world in the first place.

Sound like it might work?

asfimport commented 8 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Michael McCandless: I realized suddenly how to do this properly.

I like this approach, but, I thought this is what we are doing already!

I.e., BKD fundamentally already operates in this quantized (small cube) space, since it's only ever given quantized points, and it splits at the quantized points.

Then, in PointInShapeIntersectVisitor.compare we do this:

    double xMin = decodeValueMin(NumericUtils.sortableBytesToInt(minPackedValue, 0));
    double xMax = decodeValueMax(NumericUtils.sortableBytesToInt(maxPackedValue, 0));
    double yMin = decodeValueMin(NumericUtils.sortableBytesToInt(minPackedValue, 1 * Integer.BYTES));
    double yMax = decodeValueMax(NumericUtils.sortableBytesToInt(maxPackedValue, 1 * Integer.BYTES));
    double zMin = decodeValueMin(NumericUtils.sortableBytesToInt(minPackedValue, 2 * Integer.BYTES));
    double zMax = decodeValueMax(NumericUtils.sortableBytesToInt(maxPackedValue, 2 * Integer.BYTES));

    GeoArea xyzSolid = GeoAreaFactory.makeGeoArea(PlanetModel.WGS84, xMin, xMax, yMin, yMax, zMin, zMax);

I.e, that xyzSolid should in fact already be spanning the full little-cube space that this BKD cell covers? We then go on and relate that xyzSolid to the query shape.

asfimport commented 8 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Ooooh ... I think I see the bug! I need to fix decodeValueMin/Max to match floor not round that we used to do! I'll test that change, and re-beast.

asfimport commented 8 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

OK how about this? I moved the min/max decode back to Geo3DUtil (package private!), renamed them to floor/ceil.

I'm beasting now...

Sorry for the noise @DaddyWri! Now we can have a relaxed lunch :)

asfimport commented 8 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

Phew.

Yes, I now vaguely recall that we had this discussion early. I'm glad you found the issue before I went nuts repeating last year's logic. :-)

asfimport commented 8 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

Patch looks good! +1 for the latest

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit df07e0c30ff0d90f9052dd411f027c0dfcc3fb88 in lucene-solr's branch refs/heads/master from Mike McCandless https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=df07e0c

LUCENE-7168: fix ceil/floor decode to match encode

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit 16c6dd9a2d67d88e18a43f24ff636a4ddca0f123 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=16c6dd9

LUCENE-7168: fix ceil/floor decode to match encode

asfimport commented 8 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Here's a new patch, simplifying the encode/decode, based on a good idea that @DaddyWri suggested (over lunch!) to change decode to decode to the center value. This gets us our stability without having to pick special double values, and it means we can use the full integer space again, but I did need to add the same if that LatLonPoint has about not being able to encode precisely the max value.

asfimport commented 8 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

+1 for the new patch.

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1848477bd83ca3f45ffda0d15a2eee901adb90b6 in lucene-solr's branch refs/heads/master from Mike McCandless https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1848477

LUCENE-7168: use center value on decode

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit dd9d5d06d2d5aef1f9367ff6459b325a44d0de4c in lucene-solr's branch refs/heads/branch_6x from Mike McCandless https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=dd9d5d0

LUCENE-7168: use center value on decode

asfimport commented 8 years ago

Chris M. Hostetter (@hossman) (migrated from JIRA)

Manually correcting fixVersion per Step #S5 of #8326