apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.61k stars 1.02k forks source link

ToParentBlockJoinCollector provides no way to access computed scores and the maxScore [LUCENE-4077] #5149

Closed asfimport closed 12 years ago

asfimport commented 12 years ago

The constructor of ToParentBlockJoinCollector allows to turn on the tracking of parent scores and the maximum parent score, however there is no way to access those scores because:


Migrated from LUCENE-4077 by Christoph Kaser, resolved May 31 2012 Attachments: LUCENE-4077.patch (versions: 4)

asfimport commented 12 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Patch.

I added getMaxScore to ToParentBlockJoinCollector, and GroupDocs.score to access the aggregated score for the group. I also added a ScoreMergeMode enum to TopGroups.merge to control how the scores from the same group across multiple shards should be merged.

asfimport commented 12 years ago

Christoph Kaser (migrated from JIRA)

Hello Mike,

thank you for the patch. There is one small problem: ToParentBlockJoinCollector.getMaxScore() always returns NaN. This happens because maxScore is initialized as

private float maxScore = Float.NaN;

and then updated as

maxScore = Math.max(score, maxScore);

which is always NaN.

I hope I applied the patch to the correct revision and this is not caused by a version conflict.

asfimport commented 12 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Ugh, sorry. Sneaky NaN!

I added an assert in TestBJQ that shows the failure, then fixed it...

asfimport commented 12 years ago

Christoph Kaser (migrated from JIRA)

This patch works perfectly for my application. Thank you!

asfimport commented 12 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Super, thanks for testing Christoph. I'll commit shortly...

asfimport commented 12 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I decided to add the maxScore to TopGroups so it's consistent w/ TopDocs; this way you don't have to ask the collector for the maxScore....

asfimport commented 12 years ago

Christoph Kaser (migrated from JIRA)

Hi Mike,

shouldn't TopGroups.maxScore contain the maximum parent score? If I am not mistaken, the way it is built now, it contains the maximum child score over all children.

This is due to this line in ToParentBlockJoinCollector.getTopGroups():

maxScore = Math.max(maxScore, topDocs.getMaxScore());

I think it should read:

totalMaxScore = Math.max(totalMaxScore, og.score);

Otherwise, topGroups.maxScore is different to ToParentBlockJoinCollector.getMaxScore()

asfimport commented 12 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Otherwise, topGroups.maxScore is different to ToParentBlockJoinCollector.getMaxScore()

Woops, you're right, thanks. In fact I should be passing the maxScore that the collector already computed, not recomputing it in ToParentBJC.getTopGroups...

asfimport commented 12 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

New patch, using the collector's maxScore in TPBJC.getTopGroups.

asfimport commented 12 years ago

Christoph Kaser (migrated from JIRA)

Thank you, now it works perfectly!

asfimport commented 12 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Super, thanks Christoph, I'll commit shortly...