apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.61k stars 1.02k forks source link

Improve GroupingSearch API and extensibility [LUCENE-7617] #8668

Closed asfimport closed 7 years ago

asfimport commented 7 years ago

While looking at how to make grouping work with the new XValuesSource API in core, I thought I'd try and clean up GroupingSearch a bit. We have three different ways of grouping at the moment: by doc block, using a single-pass collector; by field; and by ValueSource. The latter two both use essentially the same two-pass mechanism, with different Collector implementations.

I can see a number of possible improvements here:


Migrated from LUCENE-7617 by Alan Woodward (@romseygeek), resolved Jan 07 2017 Attachments: LUCENE-7617.patch (versions: 3)

asfimport commented 7 years ago

Alan Woodward (@romseygeek) (migrated from JIRA)

Here's a patch for the first point above, creating a new Grouper class that you can pass to GroupingSearch. Needs javadocs, but all tests pass.

asfimport commented 7 years ago

Martijn van Groningen (@martijnvg) (migrated from JIRA)

+1 to this change. This should make using these collectors easier.

There are a couple of places where I saw if statements without curly brackets. Maybe add these curly brackets. I find it easier to read.

clean up the generics on the two-pass collectors - maybe look into removing them entirely?

As far as I can see the bases classes use these generics, so that subclasses don't have to do manual casts. Which parts you like to cleanup?

think about moving the document block method into the join module instead, alongside all the other block-indexing code

I would prefer if the BlockGroupingCollector stayed in the grouping module. The block indexing is a feature provided by core and the way I see it modules can have features that use that. Also the the join and grouping modules provide each a different functionality. Although from a higher level the functionality is a bit overlapping, in a sense that some use cases could be implemented with both the join or the grouping module.

rename the various Collector base classes so that they don't have 'Abstract' in them anymore

agreed, a lot 'Abstract' in the names :)

asfimport commented 7 years ago

Alan Woodward (@romseygeek) (migrated from JIRA)

Thanks for the review Martijn! Here's an updated patch:

Given that everything here is marked as experimental, I think we're OK to just backwards-break? Most people will be using GroupingSearch, I think, which stays the same.

asfimport commented 7 years ago

Martijn van Groningen (@martijnvg) (migrated from JIRA)

+1 Thanks for cleaning this up!

I found a few places still using GROUP_VALUE_TYPE, in SecondPassGroupingCollector.SearchGroupDocs, GroupDocs, TopGroups, AllGroupHeadsCollector.GroupHead and Grouping.Command (in Solr).

Given that everything here is marked as experimental, I think we're OK to just backwards-break?

Yes, that is OK.

asfimport commented 7 years ago

Alan Woodward (@romseygeek) (migrated from JIRA)

Final patch. I ended up removing the no-op group head collectors, as Solr was relying on the AllGroupHeadCollector returning a FixedBitSet - this should probably be just a Bits instance instead, but that can be dealt with in a later issue. Will commit later on today.

asfimport commented 7 years ago

Alan Woodward (@romseygeek) (migrated from JIRA)

The ASF bot didn't pick up the commits, for some reason:

branch_6x: d4d3ede51cc114ad98fb05e19fd6c6e15e724f34 master: da30f21f5d2c90a4e3d4fae87a297adfd4bbb3cb

Thanks for the reviews Martijn!