apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.67k stars 1.03k forks source link

Include geo3d package, along with Lucene integration to make it useful [LUCENE-6196] #7258

Closed asfimport closed 9 years ago

asfimport commented 9 years ago

I would like to explore contributing a geo3d package to Lucene. This can be used in conjunction with Lucene search, both for generating geohashes (via spatial4j) for complex geographic shapes, as well as limiting results resulting from those queries to those results within the exact shape in highly performant ways.

The package uses 3d planar geometry to do its magic, which basically limits computation necessary to determine membership (once a shape has been initialized, of course) to only multiplications and additions, which makes it feasible to construct a performant BoostSource-based filter for geographic shapes. The math is somewhat more involved when generating geohashes, but is still more than fast enough to do a good job.


Migrated from LUCENE-6196 by Karl Wright (@DaddyWri), 2 votes, resolved May 06 2015 Attachments: geo3d.zip, geo3d-tests.zip, LUCENE-6196_Geo3d.patch, LUCENE-6196-additions.patch, LUCENE-6196-fixes.patch, ShapeImpl.java

asfimport commented 9 years ago

David Smiley (@dsmiley) (migrated from JIRA)

Aha, my hunch was right then. You've already made your contribution even though it hasn't been committed here. I theoretically don't need anything from you for this to have a home in Spatial4j but I'll let you know otherwise – e.g. a simple attestation that you had permission from your employer to license it as-such.

I still want to hook a Lucene spatial test to it so see how it fairs.

asfimport commented 9 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

I theoretically don't need anything from you for this to have a home in Spatial4j but I'll let you know otherwise – e.g. a simple attestation that you had permission from your employer to license it as-such.

Are you asking me yet again whether I had permission to contribute this?

The permission I have from my employer is a long-granted corporate permission to contribute to Lucene and Solr. My employer is aware that any and all code contributed to this project will be licensed with the Apache 2 license. There are, of course, certain restrictions as to exactly what can be contributed, but I do not believe this geospatial library violates any of those restrictions.

I hope this is sufficient; you are unlikely to get more detail.

However, as I have indicated earlier in this thread, unless this code winds up in Lucene or Solr, for the moment I can't contribute to it any further. This may change in the future, but that is the situation at the moment.

asfimport commented 9 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

Minor cleanups, and reduction of duplicated code...

asfimport commented 9 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

Fixed an issue with bounding boxes and longitude slices; updated tests

asfimport commented 9 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

Any news as to what is going to happen with this contribution?

asfimport commented 9 years ago

David Smiley (@dsmiley) (migrated from JIRA)

Hi Karl, I plan to incorporate it into Spatial4j, and the first step is an Eclipse "CQ" (Contribution Questionnaire) which is part of the IP process for code > 1000 lines. I've done that and it needs one more PMC vote there (I think).

p.s. I'm on vacation this week so I may not be so responsive.

asfimport commented 9 years ago

Shalin Shekhar Mangar (@shalinmangar) (migrated from JIRA)

I plan to incorporate it into Spatial4j, and the first step is an Eclipse "CQ" (Contribution Questionnaire) which is part of the IP process for code > 1000 lines. I've done that and it needs one more PMC vote there (I think).

This is a contribution to Lucene/Solr and it has been made quite clear by the contributor that he can only contribute this to Lucene/Solr. Why are you trying to contribute it to some other project and by whose permission (certainly not the original contributor's)? This is a genuine question, not a complaint. Please help me understand.

asfimport commented 9 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

...by whose permission (certainly not the original contributor's)?

Based on the license included with the contribution, anyone can take the code anywhere they want, provided the license remains intact. However, I agree that that still doesn't answer the question as to why it isn't being included in Lucene, as originally proposed.

asfimport commented 9 years ago

Shalin Shekhar Mangar (@shalinmangar) (migrated from JIRA)

Based on the license included with the contribution, anyone can take the code anywhere they want, provided the license remains intact.

Okay, I missed that part. But still, quoting you:

However, as I have indicated earlier in this thread, unless this code winds up in Lucene or Solr, for the moment I can't contribute to it any further. This may change in the future, but that is the situation at the moment.

You (the original author) can neither contribute it directly to another project nor are you able to contribute to it further unless it lands in Lucene or Solr, at least for now. That, for me, is reason enough to have this code in Lucene/Solr land.

asfimport commented 9 years ago

Ramkumar Aiyengar (@andyetitmoves) (migrated from JIRA)

Shalin, see Nick and David's comments above, might help answer your question..

https://issues.apache.org/jira/browse/LUCENE-6196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300866#comment-14300866

https://issues.apache.org/jira/browse/LUCENE-6196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300880#comment-14300880

asfimport commented 9 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

Not sure what happened to this, but if it hasn't landed in either Lucene or Spatial4j, perhaps Apache SIS would be interested in it?

asfimport commented 9 years ago

David Smiley (@dsmiley) (migrated from JIRA)

It's entry into Spatial4j is somewhat entangled with the incubation process for Spatial4j into LocationTech/Eclipse which is slowly advancing along. Since that's been happening slowly, I'm tempted to integrate it before and not after. I think in a week or so I'll give that a shot, esp. if there's no incubation progress.

Wether Geo3d becomes a part of SIS or not is up to that-project. I imagine they would be interested but I don't know.

asfimport commented 9 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

Ok, I will wait to see how the integration progresses before asking the SIS team if they would have an interest. The body of code involved is obviously highly tuned for the search problem; it's not as good a fit for the SIS mission (seems to me). But I'd rather it went there than nowhere.

asfimport commented 9 years ago

Nick Knize (@nknize) (migrated from JIRA)

In the meantime it might not be a bad idea putting this in an org.apache.lucene.spatial.geometry package in the _5x branch to give it some mileage and iterations in the wild until it can find its permanent home. That guarantees Karl the ability to continue to contribute while David works it into S4J.

Also, since Karl can't contribute if it goes into S4J (anything changed here Karl?) I had reached out to SIS a few days ago to make them aware of his contribution and gauge their interest on collaboration. I'm assuming you can contribute to a sister Apache project? They're diligent on being ISO 19107 and OGC compliant which is a plus but also complicates the integration a bit. A nice integration exposes the functionality to the referencing and projection capabilities (something I'll be getting back to with S4J, David). There's an open dialog on the SIS mailing list tossing around ideas if you guys want to jump in the conversation.

asfimport commented 9 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

Hi Nicholas, I'm afraid that the only project I can contribute work-related development to would be Lucene/Solr.

asfimport commented 9 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

In the meantime it might not be a bad idea putting this in an org.apache.lucene.spatial.geometry package in the _5x branch to give it some mileage and iterations in the wild until it can find its permanent home.

+1, let's just add it here: progress not perfection!

I had reached out to SIS a few days ago to make them aware of his contribution

Thanks @nknize! Here's the discussion over in our sister project, Apache SIS: http://markmail.org/message/f2k2ykyqisclrzi3

asfimport commented 9 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

Thanks. If this (temporarily at least) resides in lucene, I'll do what is necessary to extend/maintain it.

It sounds like the SIS people have similar concerns about the search-focus of the package that I did. Still, they're welcome to use it as a basis for the 3d package they intend to develop for the long term.

asfimport commented 9 years ago

David Smiley (@dsmiley) (migrated from JIRA)

I do like the idea of committing it temporarily for the aforementioned reasons. Still I need to spend some hands-on time preparing a patch of it integrated with a SpatialContext implementation and with some existing tests using these Shape implementations instead of the default ones. For example, a subclass of RandomSpatialOpFuzzyPrefixTreeTest that provides random-ish Geo3d shapes. In doing so, it'll test that the bounding box of a shape is not smaller than it needs to be, and that a Geo3d Shape reports the correct intersection relationship with a rectangle. At least, this is what I plan to do when I start.

asfimport commented 9 years ago

David Smiley (@dsmiley) (migrated from JIRA)

Attached is a patch file that includes this code in the spatial module under this package: org.apache.lucene.spatial.spatial4j.geo3d. I added Geo3dShape to org.apache.lucene.spatial.spatial4j. I also added a Geo3dRptTest class which extends RandomSpatialOpStrategyTestClass configured to use the new CompositeSpatialStrategy, and with random indexed rectangles. Right now to keep things simple, the query shapes are always triangles created via Geo3d. I quickly encountered a failure and I minimized it to a simple test; see testTriangleDisjointRect().

I added some toString()'s to a few classes to aid debugging but not all of them. To those same classes I added hashCode – but again, almost all of these classes should have these things.

I put the code up in ReviewBoard: https://reviews.apache.org/r/33114/ I've only used that tool a little so I'm not sure if it's possible for you to upload more diff's, Karl, or wether only the creator (me) can. If that's a problem; you could simply re-create. I couldn't add you as a Reviewer because you don't seem to be in the system.

asfimport commented 9 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

Hi David, I received an invite from the review board. Not sure what to do with it exactly.

I'll look at the test failure this evening and get back to you this evening about that.

asfimport commented 9 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

Hi David,

Your test fails because it is trying to define shapes using lat/lon in degrees. geo3d uses radians for everything.

Also, when I apply the patch to current lucene trunk, I get 10 build errors. Looks like the lucene javadoc settings are blowing up the build:

    [javac] C:\wip\lucene\lucene\spatial\src\java\org\apache\lucene\spatial\spatial4j\geo3d\GeoConvexPolygon.java:40: error: bad use of '>'
    [javac]     * Accepts only values in the following ranges: lat: -PI/2 -> PI/2, lon: -PI -> PI
    [javac]                                                                ^
    [javac] C:\wip\lucene\lucene\spatial\src\java\org\apache\lucene\spatial\spatial4j\geo3d\GeoConvexPolygon.java:40: error: bad use of '>'
    [javac]     * Accepts only values in the following ranges: lat: -PI/2 -> PI/2, lon: -PI -> PI
    [javac]                                                                                  ^
    [javac] C:\wip\lucene\lucene\spatial\src\java\org\apache\lucene\spatial\spatial4j\geo3d\GeoConvexPolygon.java:56: error: bad use of '>'
    [javac]     * Accepts only values in the following ranges: lat: -PI/2 -> PI/2, lon: -PI -> PI
    [javac]                                                                ^
    [javac] C:\wip\lucene\lucene\spatial\src\java\org\apache\lucene\spatial\spatial4j\geo3d\GeoConvexPolygon.java:56: error: bad use of '>'
    [javac]     * Accepts only values in the following ranges: lat: -PI/2 -> PI/2, lon: -PI -> PI
asfimport commented 9 years ago

Martin Desruisseaux (@desruisseaux) (migrated from JIRA)

I had a quick look at the new classes. I would like to mention one possibility in case it may be of interest. I saw various classes related to a latitude/longitude bounding box. If a dependency toward the GeoAPI interfaces is considered acceptable (GeoAPI is a translation of some international standards into Java interfaces, and is published by the Open Geospatial Consortium), then those classes could implement the Envelope interface. The getCoordinateReferenceSystem() method could return null for now, but this would keep the door open for connecting the Geo3D bounding box to a map reprojection engine in a future version if desired. For example if a future Geo3D version returns a non-null value, then the Apache SIS reprojection method could work directly with those Geo3D classes. If a GeoAPI dependency is considered not desirable at this time, just checking that there is no incompatibility between the Geo3D classes and Envelope API may help to keep the door open.

By the way, the Geo3D documentation said that it works with latitude/longitude in radians, but I saw no mention of which geographic system (there is many, with differences up to 3 kilometres between them - ignoring those who do not use Greenwich as their prime meridian). This is the kind of information which is normally provided by Envelope.getCoordinateReferenceSystem(), but otherwise just a note in the documentation may help. I presume that Geo3D implies the WGS84 system?

asfimport commented 9 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

Hi Martin,

The geo3d package is based wholly on the unit sphere centered on an x,y,z coordinate origin. There is no place where the radius of the earth etc factor into the math. In order to cast the problem in terms of any particular standard, therefore, you need to multiply distances computed by the radius of the earth.

Oblateness is also not considered for the purposes of this package. Its best use is in conjunction with maps which also are projected to a sphere. Luckily this happens to be quite common.

asfimport commented 9 years ago

Martin Desruisseaux (@desruisseaux) (migrated from JIRA)

Thanks Karl for the precision.

On the javac error, you can fix them by replacing expression like -PI/2 -> PI/2 by

 {`@literal` -PI/2 -> PI/2}

or alternatively replace > by >

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1673165 from @dsmiley in branch 'dev/branches/lucene6196' https://svn.apache.org/r1673165

LUCENE-6196: Geo3d initial checkin

Deltas from Karl's first upload: change of package, some hashCode() impls, a few toString() impls, some javadoc formatting. New Geo3dRtTest. Geo3dShape throws an exception if not geo.

asfimport commented 9 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

This patch adds support for degenerate cases, and corrects a bug in GeoWideRectangle.

asfimport commented 9 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

New version of additions with problem fixed

asfimport commented 9 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

Fixes for failing tests

asfimport commented 9 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

This revised fixes patch catches IllegalArgumentException and picks a different triangle when it occurs.

asfimport commented 9 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

Fixes that include polygon testing for 4 and 5 sided polygons, with a better "illegal polygon" detector.

asfimport commented 9 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

Yet more fixes, this time for circles

asfimport commented 9 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

Final set of tests working.

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1674049 from @dsmiley in branch 'dev/branches/lucene6196' https://svn.apache.org/r1674049

committing Karl's latest LUCENE-6196-fixes.patch https://issues.apache.org/jira/secure/attachment/12725741/LUCENE-6196-fixes.patch WIP; tests fail

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1674732 from @dsmiley in branch 'dev/branches/lucene6196' https://svn.apache.org/r1674732

committing Karl's latest LUCENE-6196-fixes.patch (plus my test failures) https://reviews.apache.org/r/33268/diff/raw/ WIP; tests fail

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1675374 from @dsmiley in branch 'dev/branches/lucene6196' https://svn.apache.org/r1675374

LUCENE-6196: committing Karl's latest patch https://reviews.apache.org/r/33353/ (diff #9) https://reviews.apache.org/r/33353/diff/raw/

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1675385 from @dsmiley in branch 'dev/branches/lucene6196' https://svn.apache.org/r1675385

LUCENE-6196: RectIntersectionTestHelper: fix to work with testing rectangles, and more clearly test when the shape.getBoundingBox may be faulty

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1675386 from @dsmiley in branch 'dev/branches/lucene6196' https://svn.apache.org/r1675386

LUCENE-6196: Geo3dShapeTest -> Geo3dShapeRectRelationTest and complete testing of the 4 shape types

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1675747 from @dsmiley in branch 'dev/branches/lucene6196' https://svn.apache.org/r1675747

LUCENE-6196: committing Karl's latest patch https://reviews.apache.org/r/33476/ (diff #2) Addresses getBoundingBox issues

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1675777 from @dsmiley in branch 'dev/branches/lucene6196' https://svn.apache.org/r1675777

LUCENE-6196 testRelateWithRectangle shouldn't print last shape relation when we fail due to too few predicate occurrences.

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1677205 from @dsmiley in branch 'dev/branches/lucene6196' https://svn.apache.org/r1677205

LUCENE-6196: committing Karl's latest patch https://reviews.apache.org/r/33509/ (diff #6)

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1677527 from @dsmiley in branch 'dev/branches/lucene6196' https://svn.apache.org/r1677527

LUCENE-6196: committing Karl's latest patch https://reviews.apache.org/r/33780/ (diff #1)

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1677595 from @dsmiley in branch 'dev/branches/lucene6196' https://svn.apache.org/r1677595

LUCENE-6196: Reformat code. Removed System.err & legacy comments in test. Fixed test compile warning.

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1677656 from @dsmiley in branch 'dev/branches/lucene6196' https://svn.apache.org/r1677656

LUCENE-6196: Fix javadoc issues; ant precommit is happy.

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1677658 from @dsmiley in branch 'dev/branches/lucene6196' https://svn.apache.org/r1677658

LUCENE-6196: Mark @lucene.experimental or @lucene.internal

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1677669 from @dsmiley in branch 'dev/branches/lucene6196' https://svn.apache.org/r1677669

LUCENE-6196: Mark @lucene.experimental or @lucene.internal

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1677670 from @dsmiley in branch 'dev/branches/lucene6196' https://svn.apache.org/r1677670

LUCENE-6196: committing Karl's latest patch https://reviews.apache.org/r/33811/ (diff #3)

asfimport commented 9 years ago

David Smiley (@dsmiley) (migrated from JIRA)

I think the Geo3d branch, technically lucene6196, is now ready to merge into trunk, and then the 5x branch. I could generate a patch, but unless there are process reasons (e.g. I "have to"?) or technical reasons I am unaware of, I'll simply merge in the branch. The CHANGES.txt entry I plan to add is as follows:

* LUCENE-6196: New Spatial "Geo3d" API with partial Spatial4j integration.
  It is a set of shapes implemented using 3D planar geometry for calculating
  spatial relations on the surface of a sphere. Shapes include Point, BBox,
  Circle, Path (buffered line string), and Polygon.
  (Karl Wright via David Smiley)

Karl, if you suggest any changes then just let me know. If I don't get another +1 then I'll commit in two days.

asfimport commented 9 years ago

Karl Wright (@DaddyWri) (migrated from JIRA)

+1 for merge to trunk.

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1678005 from @dsmiley in branch 'dev/trunk' https://svn.apache.org/r1678005

LUCENE-6196: Geo3d API, 3d planar geometry for surface of a sphere. This merge commit renames a couple test utilities that appeared to be tests but weren't.

asfimport commented 9 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1678007 from @dsmiley in branch 'dev/branches/branch_5x' https://svn.apache.org/r1678007

LUCENE-6196: Geo3d API, 3d planar geometry for surface of a sphere. This merge commit renames a couple test utilities that appeared to be tests but weren't.