apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.69k stars 1.04k forks source link

join module should not depend on grouping module [LUCENE-3997] #5070

Open asfimport opened 12 years ago

asfimport commented 12 years ago

I think TopGroups/GroupDocs should simply be in core?

Both grouping and join modules use these trivial classes, but join depends on grouping just for them.

I think its better that we try to minimize these inter-module dependencies. Of course, another option is to combine grouping and join into one module, but last time i brought that up nobody could agree on a name.

Anyway I think the change is pretty clean: its similar to having basic stuff like Analyzer.java in core, so other things can work with Analyzer without depending on any specific implementing modules.


Migrated from LUCENE-3997 by Robert Muir (@rmuir), updated May 09 2016 Attachments: LUCENE-3997.patch (versions: 2)

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Patch, after:

svn mv lucene/grouping/src/java/org/apache/lucene/search/grouping/TopGroups.java lucene/core/src/java/org/apache/lucene/search

svn mv lucene/grouping/src/java/org/apache/lucene/search/grouping/GroupDocs.java lucene/core/src/java/org/apache/lucene/search
asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Patch, after:

svn mv lucene/grouping/src/java/org/apache/lucene/search/grouping/TopGroups.java lucene/core/src/java/org/apache/lucene/search

svn mv lucene/grouping/src/java/org/apache/lucene/search/grouping/GroupDocs.java lucene/core/src/java/org/apache/lucene/search
asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Sorry for the duplicate upload... jira was going nutso on me

asfimport commented 12 years ago

Martijn van Groningen (@martijnvg) (migrated from JIRA)

+1! Good idea. Maybe we can move FunctionValues and ValueSource from the queries modules to core? Then grouping module doesn't have to depend on the queries module.

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Maybe we can move FunctionValues and ValueSource from the queries modules to core? Then grouping module doesn't have to depend on the queries module.

+1 (I didn't even think of that or investigate it yet though, but at a glance it looks like the right thing to do).

asfimport commented 12 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

I think that if you try to make no modules depend on other modules, you'll end up just pulling pretty much everything into core.

Also, the function query stuff is supposed to be marked as experimental - the notice only got added to FunctionQuery (I think?), so it should be applied to FunctionValues and ValueSource if they are moved to core.

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I think that if you try to make no modules depend on other modules, you'll end up just pulling pretty much everything into core.

I don't think we should pull everything into core, but if we pull in the simple abstract APIs we can have a more pluggable API: just like the abstract Analyzer api is in core, which Highlighter uses, but you can highlight UIMA or Japanese or ICU or whatever analyzers this way...

asfimport commented 12 years ago

Martijn van Groningen (@martijnvg) (migrated from JIRA)

I also think we can move these classes to core. These are small classes and we can mark these classes as experimental.

Maybe we can even make this classes 'lighter' by only moving the public methods to core (maybe as interface?). E.g. ValueSource would have all the public methods in core and a BaseValueSource (Or AbstractValueSource) in the queries module that contains ValueSourceComparatorSource and ValueSourceComparator. Just an idea.

I'll create a new issue to not make grouping module depend on the queries module.

asfimport commented 12 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

+1

asfimport commented 12 years ago

Chris Male (migrated from JIRA)

I do echo Yonik's concern here, I don't think we should prevent inter-module dependencies. I think if we move something to lucene/core, it should be because we think its a core API/concept, not just because its used by multiple modules. Analyzer fits into that category, it belongs in core because it is a core concept.

Do we feel the same about TopGroups and GroupDocs? I kind of think we do. But we should move them for that reason, not just to remove the dependency.

asfimport commented 12 years ago

Chris Male (migrated from JIRA)

Of course, another option is to combine grouping and join into one module, but last time i brought that up nobody could agree on a name.

If that is the better option, lets do that. The name seems less important at this stage, we can call it grouping-join if needs be.

asfimport commented 12 years ago

Martijn van Groningen (@martijnvg) (migrated from JIRA)

The reason that joining and grouping are different modules is that these are different functionalities. However these functionalities do overlap a bit with each other. Both joining and grouping can be used for a parent child like search. I'm not sure what would be a good option. Joining does use grouping's TopGroups and GroupDocs... If we are going to have a combined module maybe we should name it relational module or parent child module?

asfimport commented 12 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I don't think we should combine the two modules.

While they do share a couple classes (to represent a 'grouped' result), the two functions (joining and grouping) are really orthogonal: you can join w/o doing grouping, and you can group w/o doing joining.

I think we should move TopGroups/GroupDocs to core.

asfimport commented 12 years ago

Chris Male (migrated from JIRA)

To me they seem to share a lot of similarities and the fact they both use the 'grouped' result notion is an illustration of that.

While a group could consist of Documents with any kind of relationship, that kind of a relationship could be parent-child. The nature of the relationship and what the result should consist of (if its a parent-child relationship, should the 'grouped' result be parent and children, just children or just the parent) seem to be what dictates the implementations used.

I feel that having them as a single module would allow us to build some APIs which focus on user land concepts and perhaps hide some of the implementation details and differences in the joining and grouping algorithms.

asfimport commented 12 years ago

Steven Rowe (@sarowe) (migrated from JIRA)

I propose, instead of using lucene-core as the location for code used by multiple modules, that we create a (single) new module that serves this purpose, something like lucene-shared or lucene-common (though common analyzers already use this name...).

That way the number of inter-module dependencies is limited, and lucene-core doesn't get roped into the act.

asfimport commented 12 years ago

Martijn van Groningen (@martijnvg) (migrated from JIRA)

Steven, I think this is a good idea for the reasons you mentioned. I think the new shared module should be named 'parent-child'. This name describes the overlapping functionality both existing modules have.

Directory layout:

-- lucene
|
|--- parent-child
|         |
|         |--- grouping
|         |
|         |--- join
|
|--- ...
asfimport commented 12 years ago

Chris Male (migrated from JIRA)

I propose, instead of using lucene-core as the location for code used by multiple modules, that we create a (single) new module that serves this purpose, something like lucene-shared or lucene-common (though common analyzers already use this name...)

I actually created lucene-common that when I first refactored out the FunctionQuery codebase. After some time it was decided (in an issue I can't remember) that the code would go into lucene-core. I agree with your assessment that we shouldn't use lucene-core as a dumping ground, but we should get a discussion about this going.

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Can we please not add more modules here. I'm against that, this is crazy: its only a few classes in question already. it doesnt need 3 modules...

the purpose of this issue was to help simplify modules and dependencies, not make it worse.

asfimport commented 12 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I think moving TopGroups/GroupDocs to core is fine. Pragmatism over purity.

asfimport commented 11 years ago

Steven Rowe (@sarowe) (migrated from JIRA)

Bulk move 4.4 issues to 4.5 and 5.0

asfimport commented 10 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Move issue to Lucene 4.9.