apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.73k stars 1.05k forks source link

Replace spatial contrib module with LSP's spatial-lucene module [LUCENE-3795] #4868

Closed asfimport closed 12 years ago

asfimport commented 12 years ago

I propose that Lucene's spatial contrib module be replaced with the spatial-lucene module within Lucene Spatial Playground (LSP). LSP has been in development for approximately 1 year by David Smiley, Ryan McKinley, and Chris Male and we feel it is ready. LSP is here: http://code.google.com/p/lucene-spatial-playground/ and the spatial-lucene module is intuitively in svn/trunk/spatial-lucene/.

I'll add more comments to prevent the issue description from being too long.


Migrated from LUCENE-3795 by David Smiley (@dsmiley), 8 votes, resolved Mar 15 2012 Linked issues:

asfimport commented 12 years ago

David Smiley (@dsmiley) (migrated from JIRA)

LSP is comprised of several modules:

The spatial-solr module of LSP can be considered in another issue following the conclusion of this one. The other modules aren't being considered for incorporation into Lucene/Solr.

LSP is largely new code although some of it originated using chunks of the existing Lucene spatial contrib module and SOLR-2155 (A recursive PrefixTree/Trie algorithm using geohashes). It's fair to say this is a superset and descendent of SOLR-2155 but with a real framework around it and plenty of refactorings and tests.

I ran Atlassian's Clover code coverage to get some statistics of this spatial-lucene module of LSP:

The code coverage surprises me a little... perhaps the number is higher when the spatial-solr module gets involved which uses more of the classes then the tests do here alone.

asfimport commented 12 years ago

David Smiley (@dsmiley) (migrated from JIRA)

The spatial-lucene module of LSP has 3 main packages: 'base', 'strategies', and 'benchmark'. It also has a fair amount of tests.

Base

Major pieces in 'base':

Strategies

The "strategies" portion of this module contains spatial indexing/search implementations using Lucene. Major interfaces (just one):

Benchmarking

Benchmarking is a TBD; there's the start of some code there but nothing real. 1 year ago I did benchmark SOLR-2155 (with great results) and posted my benchmark code here #3918 in the interest of transparency.

Testing

Testing so far hasn't been aimed directly at increasing code coverage, it's been aimed at finding nasty corner cases in spatial. Spatial code is highly prone to edge cases.

asfimport commented 12 years ago

David Smiley (@dsmiley) (migrated from JIRA)

Features

The main goals of LSP is to be a great framework to plug in spatial search algorithms and shape implementations. It of course includes good implementations of these key abstractions. Here are some key features, most of which related to using RecursivePrefixTreeStrategy with geohashes:

Todo

There are many things I want to improve and add but in my view there isn't anything truly making this non-committable. Chris has raised concerns that the other committers will want to see benchmark results before accepting this. I'll leave that for you (the other committers) to decide.

And I also heard that some committers are unsure wether Lucene should have a spatial module at all. However there is certainly demand for it, at least at the Solr level. Furthermore, there are some non-spatial use cases of the spatial module. One interesting use-case is RecursivePrefixTreeStrategy's (RPTS) unique ability to index shapes with area. If you had a requirement to index a variable number of time durations, then unlike Lucene's trie numeric support in which only discrete numbers are supported, RPTS could be used with x being time and y being unused. Buy the way, PrefixTree and Trie are synonymous words.

asfimport commented 12 years ago

Simon Willnauer (@s1monw) (migrated from JIRA)

wow this is a lot of stuff. we certainly need a code donation for this. without getting into details +1 from my side. I think lucene desperatly needs spatial support... it should be a module IMO. we should drop the stuff we have an get this in shape ie. into a module. I am not sure about the LGPL stuff I guess we should try to integrate everything else and if we really want or if there is a way to integrate the LGPL stuff we can take care of this later!

asfimport commented 12 years ago

David Smiley (@dsmiley) (migrated from JIRA)

What constitutes a "code donation"? By the way, I've gone through the proper channels with my employer with regard to SOLR-2155 and LSP. MITRE has no copyright on this code; I've marked it all as ASF.

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Simon do we really need a code grant here?

Its my understanding (correct me if i am wrong): the developers involved (David, Ryan, Chris) are all committers with iCLA on file, so is it really any different than any other patch from that perspective?

As far as LGPL, according to David's description and the title of this jira issue (possible i did not interpret it correctly, correct me if so), the he wants to replace lucene/contrib/spatial with the spatial-lucene project, and that it has no LGPL ties at all, (only spatial-extras does).

Without looking at any code myself, if thats really the case I'm +1 on principle because it means we basically have an improved spatial module for lucene core with no catch at all. The current code has not seen much maintenance.

(And i agree, we should be shooting for a proper module/ here, not a contrib).

asfimport commented 12 years ago

Mark Miller (@markrmiller) (migrated from JIRA)

... do we really need a code grant here?

I think if you go by the letter, since it was 'developed' outside of the Apache ecosystem, we want a code grant. I wonder where apache-extras fits there though.

Personally, if it was all only developed by committers in a public repo, I don't have a problem with being practical - FWIW though - I'm just one guy INAL.

asfimport commented 12 years ago

Jan Høydahl (@janhoy) (migrated from JIRA)

Impressive piece of work! Given license stuff is ok, here is my +1

asfimport commented 12 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Cool work! I scanned the code quickly and it seems to fit much better than the current spatial!

I have some suggestions regarding performance; BooleanQuery usage and related inconsistency with BQ scoring (with coord) in the different strategies; also found some caching problems (AtomicReader is key to cache not AtomicReader.getCoreCacheKey, so new deleted docs after reopen invalidate the cache), but I would prefer to discuss that here once the patch is provided on Lucene's JIRA.

asfimport commented 12 years ago

Chris Male (migrated from JIRA)

Huge +1

Thanks so much David for opening this issue and getting the code to a point where it can be contributed. I'm really excited to see this brought into the fold and glad to see support from others.

As far as LGPL, according to David's description and the title of this jira issue (possible i did not interpret it correctly, correct me if so), the he wants to replace lucene/contrib/spatial with the spatial-lucene project, and that it has no LGPL ties at all, (only spatial-extras does).

Absolutely. The portion of the codebase which uses LGPL code is entirely optional and decoupled from the rest of the code. From a functional perspective, as David says, its only really related to polygon support which is hugely powerful but can exist somewhere else if needs be.

asfimport commented 12 years ago

Chris Male (migrated from JIRA)

but I would prefer to discuss that here once the patch is provided on Lucene's JIRA.

Is it best to create a patch here and iterate on any problems, or create a branch and work through them there?

asfimport commented 12 years ago

David Smiley (@dsmiley) (migrated from JIRA)

FYI the code coverage figure is erroneous, Clover didn't recognize some inner classes extending other tests as tests. Using IntelliJ IDEA Ultimate's built-in coverage, it's 63% (as counted per line), and I believe its higher once the spatial-solr module is brought into the mix which has a bunch of tests.

Uwe, I'm very interested in your input on anything to make the code better.

Given the volume of code, I believe a feature branch makes the most sense instead of a humungous patch file.

asfimport commented 12 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

From a functional perspective, as David says, its only really related to polygon support which is hugely powerful but can exist somewhere else if needs be.

Can we add polygon support using the java.awt.geom.Area class? It has lots of code vor overlapping polygons and similar stuff... My own gazeteer using Lucene is based on this. In general we could ad another strategy to handle this (java.awt.geom implements lots of different shape types, in contrast to "normal" AWT double not integer based).

asfimport commented 12 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

+1, Lucene badly needs good ootb spatial search.

I think a branch makes complete sense... we've made branches for much smaller things!

asfimport commented 12 years ago

David Smiley (@dsmiley) (migrated from JIRA)

Uwe, Thanks for bringing java.awt.geom.Area to my attention; I haven't noticed it before. It looks workable, and the code for it looks impressive to me. As an aside, JTS has a particularly scalable polygon scaling to many vertexes which get stored in an in-memory RTree – although I don't think this feature is as pertinent for user-input polygons which would have a small number. It is unfortunate that sun.awt.geom.Crossing is not exposed from Area, since computing it is not particularly cheap and LSP will ask Area two things – intersects() and contains() given a lat-lon box, and Area will compute the same Crossing twice. Use of this in LSP would not be a "stategy", it would be a subclass of SpatialContext which acts as a factory for shapes. Speaking of which, I'm thinking of renaming the "simple" package to be something else like "impl" since some of the implementations are decidedly not simple – GeoCircleImpl case in point, and the addition of a polygon would seal that point.

I look forward to working with you more Uwe. It appears we do a lot of similar work – geospatial and trie stuff.

asfimport commented 12 years ago

Bill Bell (migrated from JIRA)

+1 getting polygon support ootb is huge as is geohashing and multivalued lat longs. Next step Solr...

asfimport commented 12 years ago

Ryan McKinley (@ryantxu) (migrated from JIRA)

Thanks for pushing this forward David! (sorry i have been offline recently... just had a baby!)

We should defiantly make a branch for this – getting things integrated with the build system will be non-trivial.

Re code grant? given that all developers of this code are lucene committers and intend for this be contributed to ASF, I don't think it is necessary. But if we need more paperwork, that is OK too.

Re polygons / AWT / JTS? I hope this code lets us use an implementation that is appropriate for the need. In some cases, simple math or java.awt.geom may be fine, in others JTS will be necessary. My fear is that with JTS out of the core build/test system, JTS will be a secondary concern. Through this process, I will continue to make sure any design decisions don't exclude a solid JTS solution.

asfimport commented 12 years ago

Ryan McKinley (@ryantxu) (migrated from JIRA)

David – do you want to go ahead and make a branch and integrate: http://lucene-spatial-playground.googlecode.com/svn/trunk/spatial-lucene/

I still think we want a .jar file for the spatial code/API that does not need lucene. This will be important for non-lucene clients that should be able to deal with real classes rather then strings.

asfimport commented 12 years ago

David Smiley (@dsmiley) (migrated from JIRA)

I created a branch https://svn.apache.org/repos/asf/lucene/dev/branches/lucene3795_lsp_spatial_module from r1291350 (the most recent version with passing tests as indicated by Jenkins)

asfimport commented 12 years ago

David Smiley (@dsmiley) (migrated from JIRA)

Branch Status Update:

asfimport commented 12 years ago

Chris Male (migrated from JIRA)

LSP Lucene spatial is in as a module, actually as two modules, spatial/base & spatial/strategy. That complicates the build but Chris & Ryan insist. The maven build works.

I don't recall insisting that here. I think we should do what's best in this situation. If having two sub modules is causing too much difficulty for no benefit, then +1 to reducing them to one.

asfimport commented 12 years ago

David Smiley (@dsmiley) (migrated from JIRA)

Chris, in an email months ago you declared interest in a client library jar, and Ryan absolutely insisted. The difficulties have been figured out.

There is no benchmarking code. LSP did have about 4 classes in a benchmarking package but AFAIK it was draft in-progress (year old code now) and had probably never been executed before. Consequently, I chose not to copy those classes over. Once there is benchmarking, it will probably exist here: /modules/spatial/benchmarking alongside "base" and "strategy". Admittedly that seems a bit heavyweight (i.e. its own module) but it wouldn't truly belong anywhere else, although there probably wouldn't bee much code to it. I don't feel strongly about where it should go.

At this point, I think the branch is in committable shape. To the best of my ability to know, the Ant and Maven builds work, same for Eclipse & IntelliJ IDEA.

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

What is the advantage of two spatial modules? Can I run spatial queries with just base by itself?

For benchmarking code, why not put it in the benchmarking package and have benchmark depend on it (thats how all other modules, highlighter, analyzers, anything else we benchmark works)

asfimport commented 12 years ago

Ryan McKinley (@ryantxu) (migrated from JIRA)

What is the advantage of two spatial modules? Can I run spatial queries with just base by itself?

The real advantage is that client code (solrj etc) can know about Shapes/Operations without needing to include lucene. From spatial-base, you can build queries and understand the results; but can't run the query.

I need my client code to use a real API rather then needing to build the correct String query representations and parsing results.

asfimport commented 12 years ago

David Smiley (@dsmiley) (migrated from JIRA)

Rob, thanks for your suggestion/opinion on putting the spatial benchmarking code into the benchmarking module. Works for me – one less module.

Ryan, how would you feel about a single combined spatial module that you would use from your remote SolrJ client? Yes, it wouldn't be great that the jar would be half filled with classes you don't need (the spatial strategies coded against the Lucene API) but is that really such a big deal? It would be annoying to configure your maven build to not include the transitive dependencies but it's doable, and perhaps we could mark lucene as an optional dependency in a combined spatial module. In practice, anyone using this spatial module on the server will certainly have lucene already.

asfimport commented 12 years ago

Ryan McKinley (@ryantxu) (migrated from JIRA)

I guess I don't see the problem with having multiple .jar files (aside from the ant setup effort)

I'm fine with one .jar if we guarantee that the 'base' classes don't have compile time access to lucene classes. I'm sure there is some ant crazieness to compile half the project with different classpaths then bundle them together, but that seems more complex (and less clear) then two .jar files

asfimport commented 12 years ago

Chris Male (migrated from JIRA)

Chris, in an email months ago you declared interest in a client library jar, and Ryan absolutely insisted. The difficulties have been figured out.

Indeed I did but that was when we were developing this outside of Lucene, I'm now thinking what's best for Lucene and am open to any ideas.

I need my client code to use a real API rather then needing to build the correct String query representations and parsing results.

Are you able to give us some more information on what your client code needs? Is it just being able to instantiate a Shape and then convert it to a queryable String format?

perhaps we could mark lucene as an optional dependency in a combined spatial module.

+1 This seems like the best compromise

I'm fine with one .jar if we guarantee that the 'base' classes don't have compile time access to lucene classes.

I also agree with this and we should bare it in mind as we progress.

asfimport commented 12 years ago

David Smiley (@dsmiley) (migrated from JIRA)

My preference is one module vs. multiple. Now that Ryan and Chris are cool with this, we can continue with that objective.

Tomorrow night or sooner I'll get on merging them together.

asfimport commented 12 years ago

David Smiley (@dsmiley) (migrated from JIRA)

Ryan and I chatted about this issue more and I didn't take consolidation steps yet. I'm pretty neutral, by the way – I see both sides.

Another option occurred to me and I'm excited about the prospects because I think it's a good balance. To be clear, spatial-base has nothing to do with Lucene. It largely consists of shape interfaces with implementations, has some distance calculators like Haversine and other spatial calculations, and can (or at least should) parse and emit some dialect of WKT – a popular standard, extended where needed to represent shapes that aren't in WKT such as a circle. There's a good deal of testing too. It is certainly useful in its own right just as other spatial libraries are. To defend its existence when there are other spatial libraries, I'll point out a few things that make this more desirable than other 3rd party libraries:

So how about spatial-base remains in the LSP project off-Lucene. LSP and this component will both probably receive a name change and possible re-hosting on Github. LSP (or whatever its eventual name is) will always need to exist any way because there are integration scenarios involving LGPL libraries that the Lucene PMC is uncomfortable with, and there is a nifty demo webapp too. The spatial-extras module could probably be merged with spatial-base, making testing easier and it's one less jar.

If spatial-base becomes a 3rd party library required by a single Lucene spatial module, then that brings a simplicity to the code organization insofar as there is just one spatial module, not 2. It also means that the spatial module will be entirely focused on the intersection of Lucene & spatial, and not have other code unrelated to Lucene. When deployed, it would mean 2 jars, the spatial-base.jar (or whatever its renamed to) and lucene-spatial.jar. FYI Solr, at least for the moment, would only need the base one, not lucene-spatial.

The down side is that both spatial-base and lucene-spatial are in-progress and are largely developed together, and so separating them to live on independent projects will bring about some extra burden in syncing them from time to time. This is reminiscent of the Lucene & Solr projects before they were merged. To mitigate this, our spatial team (me, Ryan, Chris) can initially focus on making changes to the public API of spatial-base to the point we like it even more and are less likely to change it.

asfimport commented 12 years ago

Ryan McKinley (@ryantxu) (migrated from JIRA)

I like the idea that spatial-base would be external to lucene, and included as a .jar file. This was my original proposal when starting to discuss this long ago.

As is, the spatial library is quite useful on its own; I think it has the best chance of long term success outside of lucene. Outside lucene (ASF) it can have compile/test dependencies on JTS that make it more robust but still have strong ASL only runtime.

asfimport commented 12 years ago

Ryan McKinley (@ryantxu) (migrated from JIRA)

OK, I think the branch is ready to go.

The one thing I don't like is that the spatial4j.jar gets included twice, once in the modules 'lib' directory and again in the solr lib directory. I could not figure out how to have the solr build compile and distribute this one

asfimport commented 12 years ago

David Smiley (@dsmiley) (migrated from JIRA)

For those following along here, the former "spatial-base" module portion of this code is now an ASL licensed 3rd party jar dependency: http://spatial4j.com "Spatial4J" Basically half of LSP is there now going by this new name. The other half is here as the new lucene spatial module.

I agree that the branch looks ready to be merged into trunk.

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Can we rethink this structure? In my opinion there is a little bit of dll-hell going on on.

From Lucene's perspective as a library, 3rd party dependencies are extremely expensive. I realize this doesnt matter so much for solr, since its an app, but I think we should minimize this.

We all agreed modules should be treated like lucene core (which has no dependencies), and sure, some modules do have dependencies but they should be minimal and necessary.

Just looking at the lib/ directory in the branch I see:

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Also again about my spatial4j.jar: besides the dll-hell perspective, there is also the community perspective.

I dont think we should create github projects that only a few people can commit to and then link binary jars from them into lucene's source tree.

asfimport commented 12 years ago

Chris Male (migrated from JIRA)

I dont think we should create github projects that only a few people can commit to and then link binary jars from them into lucene's source tree.

+1

asfimport commented 12 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

commons-lang.jar: This is unnecessary and only used for EqualsBuilder/HashCodeBuilder, please remove!

die, die, die :-) To come back to an earlier comment: I don't like projects that add a 15 MB JAR file and use one single method/class out of it (this is of course an extreme example). In my opinion, common-lang.jar should only be used, if you heavily depend on it. And if you really want to use it, use version 3 (o.a.commons.lang3 package), not the Java 1.1 versions. Most of the stuff in commons-lang is useless since java 5 :-) and the old pre-commons-lang3 is not typesafe at all (and has millions of bugs regarding unicode).

asfimport commented 12 years ago

Simon Willnauer (@s1monw) (migrated from JIRA)

I agree with robert, can we try to survive without dependencies? What is the reason to have this stuff on github, its your projects anyway right? also the spatial4j notice file is a copy of the commons lang.

simon

asfimport commented 12 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

What's the reason to have spatial4j outside of Lucene?

asfimport commented 12 years ago

Jan Høydahl (@janhoy) (migrated from JIRA)

First, Ryan and David, I think you've done a great job with all this code and a million thanks for donating.

I think I also see the rationale behind splitting the more general spatial4j core into a separate project, hoping that it will attract far more users than only Lucene. While that may happen one day, perhaps we should take one step at a time, letting spatial4j start its life as part of Lucene-java (as a separate module or contrib?), and after a year or so, when it is more mature, cast a vote for whether it shuold become a Lucene sub-project, join Incubator as a new project, join as a library within Apache commons project (commons-spatial), move to GitHub with all Lucene committers invited as committers) or whatever.

commons-spatial sounds attractive to me :)

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

when it is more mature, cast a vote for whether it shuold become a Lucene sub-project, join Incubator as a new project, join as a library within Apache commons project (commons-spatial), move to GitHub with all Lucene committers invited as committers) or whatever.

In my opinion http://incubator.apache.org/sis/ seems like the correct home.

Of course the code could be donated there in parallel, if it starts picking up steam and graduates from the incubator, then it would be the right thing to depend on it I think?

asfimport commented 12 years ago

David Smiley (@dsmiley) (migrated from JIRA)

Commons-lang is used by both Spatial4J and the new spatial-module. This dependency can easily be severed and will happen shortly.

SLF4j is used by both Spatial4J and the new spatial-module. I really like SLF4J but all this resistance to remove dependencies leads me to compromise, and I'll find a way to removing it or have it as an optional dependency.

Uwe and others, the rationale for a core spatial library off of Lucene is my last (long) comment: https://issues.apache.org/jira/browse/LUCENE-3795?focusedCommentId=13216522&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13216522 For what its worth, Ryan and I absolutely love the plan for all of the points in it. I wish someone had expressed their dissenting opinion on it at that time – From Ryan and I's perspective there basically isn't anything not to like. Can anything be done to warm people up to this?

Rob; you're absolutely right that there needs to be a release tagged in the Spatial4J repo. Ryan has already taken steps to get this library in published Maven repos which is the most meaningful step that could be taken to officially release it. Again, we should certainly tag it because it is both best practice and easy.

The ASF is a bit heavy on process and less permissive on interactions with LGPL dependencies (even optional ones?) and so I don't think ASF/incubator is a good place for Spatial4j as an independent project. As frustrating as I find it, the making of spatial-4j could be reverted, returning back to the 2-module setup that some people here seemed to express resistance to.

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

The ASF is a bit heavy on process and less permissive on interactions with LGPL dependencies (even optional ones?)

Which is a great thing in my opinion if we are going to depend on an external library for spatial support.

asfimport commented 12 years ago

Paul Ramirez (migrated from JIRA)

Why not help Apache SIS grow and contribute spatial base as a module there. Then make the Lucene/Solr parts a contrib part in SIS or here. I'm sure they would love the support and would hope to get the developers of spatial base as key members of their community. This would put spatial stuff in one place and help an Apache spatial community grow. This seems more like the Apache way. That said, I am jumping in late to this discussion but I think spatial stuff really deserves its own community at Apache.

asfimport commented 12 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

While it's fair game to ask if something barely used like commons-lang can be removed, it doesn't seem like the other things should be blockers to getting this committed.

As long as spatial4j.jar is properly licensed, where that project should "live" is a different issue and has nothing to do with this. If we didn't know any of the authors of this jar, we wouldn't be having the discussion at all.

We all agreed modules should be treated like lucene core

I don't recall that - modules are shared code by both Lucene and Solr. There's a desire to not pull in all of solr in a module of course, but dependencies on other jars or other modules should be fine.

asfimport commented 12 years ago

Mark Miller (@markrmiller) (migrated from JIRA)

We all agreed modules should be treated like lucene core

Just to chime in since this caught my eye - I don't necessarily agree with this either - I think it depends on the module - thats what I thought we had agreed - like some modules could be more like lucene core and some modules might be more like benchmark - some modules might have really strong back compat, and others might declare something more experimental. And why not a module that might depend on a couple other modules?

Don't take that as an opinion on this spatial issue though - I'm not up to speed on this discussion - just wanted to weigh in on a comment that caught my eye in email.

asfimport commented 12 years ago

Ryan McKinley (@ryantxu) (migrated from JIRA)

What's the reason to have spatial4j outside of Lucene?

  1. Making JTS a 1st class test object (also why SIS is not an option)
  2. The spatial4j.jar is useful on its own – it has a chance to build its own community.
  3. Lucene is not a great dev community for things that are not primarily lucene focused.

I understand my primary concern (JTS) is a non issue for many people here – The trade off to have compile/test dependencies on JTS isn't an option at ASF.

I like this option because it gives lucene a solid ASL solution to support most things. If people want to add JTS to their runtime, they then get strong polygon support.

The alternative packaging structure gets pretty crazy:

In ASF Lucene Reps:

Elsewhere

If I want to make sure the JtsSpatialContext passes all the same tests that the SimpleSpatialContext passes, the structure gets even crazier because we have to package the test projects too!


We all agreed modules should be treated like lucene core

hymmm – my understanding is that modules have flexibility to have dependencies that are appropriate.

asfimport commented 12 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Commons-lang is used by both Spatial4J and the new spatial-module. This dependency can easily be severed and will happen shortly.

Fine!

SLF4j is used by both Spatial4J and the new spatial-module. I really like SLF4J but all this resistance to remove dependencies leads me to compromise, and I'll find a way to removing it or have it as an optional dependency.

In my opinion, non-end-user components should not log, which affects libraries. E.g. there is nothing in the JDK to enable logging of the JDK itsself, although there are surely parts that could log something. Lucene is the same, it does not need to log anything, the client code should log things like "now executing term query..." and so on. IndexWriter is a little bit special, it has now a simple "log-like" interface for debugging (consisting of abstract InfoStream class). This class can be implemented by a logging framework, but would slowdown indexing immense, as logging frameworks tend to use volatiles on every log request (even when not logging).

So I strongly recommend to remove logging. For debugging we often comment out System.out.println inside Lucene.

Uwe and others, the rationale for a core spatial library off of Lucene is my last (long) comment

OK, OK. I still don't understand the whole rationale to move it outside Lucene or to a separate module. Everybody can use the classes by adding the JAR file, too. The "useless" lucene classes don't hurt. Still I would strongly recommend to use parts of lucene.util package whereever possible, especially when performance and easy Lucene integration is needed. So dont create Strings all the time, use BytesRef and index/search using binary terms like NumericField does in Lucene trunk. If this module outside Lucene uses Strings all the time, but e.g. when indexing searching all those strings are again converted to UTF-8 BytesRefs, thats a laaaaaaaaaaaaaaarge overhead. So I prefer to sometimes duplicate code and add performant impls of e.g. term encoders for indexing/search. Every method in Lucene that is used in tight loops (like scorers or TokenStreams) should never ever use Strings (which are final and unmodifiable).

asfimport commented 12 years ago

David Smiley (@dsmiley) (migrated from JIRA)

Everything depends on something. Lucene depends on Java, and it has no control over Java except to complain when there are bugs. This module isn't the first module to have a dependency and frankly I don't understand the aversion to them – it's a natural thing. I agree you can have too many.

I think Lucene-Spatial's dependency on Spatial4j represents the best type of a dependency that Lucene/Solr could have:

  1. ASL licensed
  2. Has code & tests there that aren't related to Lucene; don't clutter or diffuse scope of Lucene's codebase.
  3. Strong relationship to Lucene/Solr. Put another way, if Spatial4j were a product, it's only customer right now is Lucene/Solr. Consequently:
    1. When Lucene/Solr decides to release a version, Spatial4J committers will do the same. No SNAPSHOT dependency from a Lucene release.
    2. When a bug or feature request comes up via Lucene that requires changes in Spatial4J, you (a Lucene committer) can coordinate these changes with great efficiency given that you know Spatial4J committers.
    3. You can become a committer on Spatial4j with far less time than it took me to become a committer here, I swear ;) - especially the spatial-minded folks: Chris (if we had your GitHub username, you'd be grandfathered in), Uwe, Grant, Yonik?

And because it's a dependency and not 1st-party code, it has a greater opportunity to receive improvements from outside parties since it's a smaller project with a more focused committer pool. This stuff has nothing to do with Lucene, remember.

asfimport commented 12 years ago

David Smiley (@dsmiley) (migrated from JIRA)

Uwe, Regarding byte / character performance stuff, I understand. Last night in fact, I successfully argued with Ryan that the SpatialPrefixTree (a trie) belongs with the Lucene spatial module because it is tightly coupled to the algorithms there. One of the arguments was that it could/should probably use ByteRef since it works with raw indexed data. Now that this moved over, I don't think there's code in Spatial4j that is or should be byte oriented. There's some geospatial (WKT-like) string parsing code, and code that generates such strings from shapes, and they are String oriented and that makes sense. As a general statement about performance, you should know that performance is an important goal of Spatial4j. So if for example the API needs to be a bit uglier to make performance compromises, this spatial library more than others is willing to bend. I think benchmarks need to prove this out first on a case-by-case basis though. Other libraries, like one where I work seems to take another extreme in which Latitude and Longitude are each separate classes!

asfimport commented 12 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Why not help Apache SIS grow and contribute spatial base as a module there.

+1