apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.68k stars 1.04k forks source link

Refactor and clean up core geo api [LUCENE-8364] #9411

Open asfimport opened 6 years ago

asfimport commented 6 years ago

The core geo API is quite disorganized and confusing. For example there is Polygon for creating an instance of polygon vertices and holes and Polygon2D for computing relations between points and polygons. There is also a PolygonPredicate and DistancePredicate in GeoUtils for computing point in polygon and point distance relations, respectively, and a GeoRelationUtils utility class which is no longer used for anything. This disorganization is due to the organic improvements of simple LatLonPoint indexing and search features and a little TLC is needed to clean up api to make it more approachable and easy to understand.


Migrated from LUCENE-8364 by Nick Knize (@nknize), updated Jun 21 2018 Attachments: LUCENE-8364.patch

asfimport commented 6 years ago

Nick Knize (@nknize) (migrated from JIRA)

Initial patch provides:

asfimport commented 6 years ago

Nick Knize (@nknize) (migrated from JIRA)

The initial patch was dirty so I removed it... this is the correct one.

asfimport commented 6 years ago

David Smiley (@dsmiley) (migrated from JIRA)

Thanks for the cleanup Nick; the cleanup is needed!

Before moving forward, lets see what becomes of the thread "DISCUSS] Geo/spatial organization in Lucene" I sent to the dev list today.

asfimport commented 6 years ago

Nick Knize (@nknize) (migrated from JIRA)

Thanks @dsmiley No worries. And thanks for opening the discussion. In the meantime I'm hoping this provides the next natural step to making the existing API's more approachable, manageable, and extendable.

asfimport commented 6 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Just looking, I have a few concerns:

Maybe, it would be easier to split up the proposed changes so its easier to review. Especially for proposing any new abstract classes as I want to make sure that we really get value out of any abstractions, due to their high cost.

asfimport commented 6 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Also the relate/relatePoint changes to Polygon are a big performance trap: this class exists solely as a thing to pass to queries. we shouldnt dynamically build large data structures and stuff here, and add complexity such as the caching and stuff it has, I really think this doesn't belong.

asfimport commented 6 years ago

Nick Knize (@nknize) (migrated from JIRA)

Thanks @rmuir for taking the time to review the first iteration patch. And for highlighting potential performance implications.

what is the goal of all the new abstractions?

At this point, its nothing more than organization, approachability, maintainability of the existing code.

I don't think we should be building a geo library here

That's fine. If that's the general consensus I'll update the patch to remove the abstractions and lock as much of this down as possible.

Maybe, it would be easier to split up the proposed changes so its easier to review.

I think removing the abstractions for this first cut will achieve that?

asfimport commented 6 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Also there is another problem with having Polygon create Polygon2D datastructures, there is not a one-to-one relationship between the two.

Anything using this should create Polygon2D explicitly itself because it has many-to-one relationship:

  /** Builds a Polygon2D from multipolygon */
public static Polygon2D create(Polygon... polygons) {

This is really important: since for multipolygons it builds a 2-stage tree. We don't want to encourage users creating these things for individual polygons and using booleanquery or something like that, it will result in stuff that runs in linear time.

asfimport commented 6 years ago

Nick Knize (@nknize) (migrated from JIRA)

I agree with that. That's why I left LatLonPointInPolygonQuery using Polygon2D but refactored the name to GeoEdgeTree. (hoping to make it a little more clear to what the class actually is). I'd like to make Polygon2D package private to the same package as LatLonPointInPolygonQuery (since that's the only place it should be used) but all LatLonPoint* still lives in sandbox. So perhaps the first step should be to graduate LatLonPoint to either core or spatial module as discussed in #8368

asfimport commented 6 years ago

Lucene/Solr QA (migrated from JIRA)

-1 overall
Vote Subsystem Runtime Comment
Prechecks
+1 test4tests 0m 0s The patch appears to include 8 new or modified test files.
master Compile Tests
+1 compile 2m 0s master passed
Patch Compile Tests
+1 compile 0m 43s the patch passed
+1 javac 0m 43s the patch passed
+1 Release audit (RAT) 0m 49s the patch passed
+1 Check forbidden APIs 0m 19s the patch passed
+1 Validate source patterns 0m 19s the patch passed
Other Tests
-1 unit 29m 40s core in the patch failed.
+1 unit 2m 55s sandbox in the patch passed.
+1 unit 0m 26s spatial in the patch passed.
+1 unit 5m 36s test-framework in the patch passed.
46m 19s
Reason Tests
Failed junit tests lucene.geo.TestRectangle
Subsystem Report/Notes
JIRA Issue LUCENE-8364
JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12928500/LUCENE-8364.patch
Optional Tests compile javac unit ratsources checkforbiddenapis validatesourcepatterns
uname Linux lucene2-us-west.apache.org 4.4.0-112-generic #135-Ubuntu SMP Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool ant
Personality /home/jenkins/jenkins-slave/workspace/PreCommit-LUCENE-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
git revision master / 629081f
ant version: Apache Ant(TM) version 1.9.6 compiled on July 8 2015
Default Java 1.8.0_172
unit https://builds.apache.org/job/PreCommit-LUCENE-Build/36/artifact/out/patch-unit-lucene_core.txt
Test Results https://builds.apache.org/job/PreCommit-LUCENE-Build/36/testReport/
modules C: lucene lucene/core lucene/sandbox lucene/spatial lucene/test-framework U: lucene
Console output https://builds.apache.org/job/PreCommit-LUCENE-Build/36/console
Powered by Apache Yetus 0.7.0 http://yetus.apache.org

This message was automatically generated.