Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect results with TopScoreDocCollector [LUCENE-2271]

asfimport commented 14 years ago

This is a foolowup to #3346, where a part of this problem was fixed (boost = 0 leading to NaN scores, which is also un-intuitive), but in general, function queries in Solr can create these invalid scores easily. In previous version of Lucene these scores ordered correct (except NaN, which mixes up results), but never invalid document ids are returned (like Integer.MAX_VALUE).

The problem is: TopScoreDocCollector pre-fills the HitQueue with sentinel ScoreDocs with a score of -inf and a doc id of Integer.MAX_VALUE. For the HQ to work, this sentinel must be smaller than all posible values, which is not the case:

-inf is equal and the document is not inserted into the HQ, as not competitive, but the HQ is not yet full, so the sentinel values keep in the HQ and result is the Integer.MAX_VALUE docs. This problem is solveable (and only affects the Ordered collector) by chaning the exit condition to:

if (score <= pqTop.score && pqTop.doc != Integer.MAX_VALUE) {
  // Since docs are returned in-order (i.e., increasing doc Id), a document
  // with equal score to pqTop.score cannot compete since HitQueue favors
  // documents with lower doc Ids. Therefore reject those docs too.
  return;
}

The NaN case can be fixed in the same way, but then has another problem: all comparisons with NaN result in false (none of these is true): x <NaN, x > NaN, NaN == NaN. This leads to the fact that HQ's lessThan always returns false, leading to unexspected ordering in the PQ and sometimes the sentinel values do not stay at the top of the queue. A later hit then overrides the top of the queue but leaves the incorrect sentinels unchanged -> invalid results. This can be fixed in two ways in HQ: Force all sentinels to the top:
```
protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
  if (hitA.doc == Integer.MAX_VALUE)
    return true;
  if (hitB.doc == Integer.MAX_VALUE)
    return false;
  if (hitA.score == hitB.score)
    return hitA.doc > hitB.doc; 
  else
    return hitA.score < hitB.score;
}
```
or alternatively have a defined order for NaN (Float.compare sorts them after +inf):
```
protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
  if (hitA.score == hitB.score)
    return hitA.doc > hitB.doc; 
  else
    return Float.compare(hitA.score, hitB.score) < 0;
}
```

The problem with both solutions is, that we have now more comparisons per hit and the use of sentinels is questionable. I would like to remove the sentinels and use the old pre 2.9 code for comparing and using PQ.add() when a competitive hit arrives. The order of NaN would be unspecified.

To fix the order of NaN, it would be better to replace all score comparisons by Float.compare() [also in FieldComparator].

I would like to delay 2.9.2 and 3.0.1 until this problem is discussed and solved.

Migrated from LUCENE-2271 by Uwe Schindler (@uschindler), updated May 09 2016 Attachments: LUCENE-2271.patch (versions: 5), LUCENE-2271-bench.patch, LUCENE-2271-maybe-as-separate-collector.patch, TSDC.patch Linked issues:

SOLR-1785

asfimport commented 14 years ago

Robert Muir (@rmuir) (migrated from JIRA)

its not a bug, as its doc'ed to work this way.

 * <p><b>NOTE</b>: The values Float.Nan,
 * Float.NEGATIVE_INFINITY and Float.POSITIVE_INFINITY are
 * not valid scores.  This collector will not properly
 * collect hits with such scores.

asfimport commented 14 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I don't think we should do anything to fix NaN, such as using more expensive comparisons (Float.compareTo) and stuff. as it is not a number, its an invalid score.

i think function queries shoudl throw and exception instead of producing NaN, this problem is only limited to function queries.

I think fixing scores of negative infinity make more sense, as these are unpreventable (again only a problem with function queries!) and at least negative infinity is actually a number.

i think "fixing" a top-N collector, or "fixing" anything that sorts NaN is wrong. NaN doesnt have a properly defined sort order. NaN has an order hacked into Float.compareTo, but this is different. sorting the primitive type makes no sense, and the documentation should stay that it doesnt work with TSDC.

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

This is patch that supports NaN and -inf.

The cost of the additional checks in HitQueue.lessThan are neglectible, as they only occur when a competitive hit is really inserted into the queue. The check enforces all sentinels to the top of the queue, regardless what their score is (because always NaN != NaN).

asfimport commented 14 years ago

Robert Muir (@rmuir) (migrated from JIRA)

attached is a patch, written by Uwe. as far as a "bugfix" i prefer this patch, as the more complicated, performance-intrusive NaN fixes I think should be something we do in 3.1

e.g., "fixing" NaN to work will likely slow down people getting large numbers of results, and i don't think we should do that in bugfix releases.

but in 3.1, we could change it, include some large results-oriented collectors for these people, and the whole thing would make sense.

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Sorry reverted a comment remove.

asfimport commented 14 years ago

Robert Muir (@rmuir) (migrated from JIRA)

bq.The cost of the additional checks in HitQueue.lessThan are neglectible, as they only occur when a competitive hit is really inserted into the queue.

This should be benchmarked for MultiSearcher and ParallelMultiSearcher, too, as they also use HitQueue.

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Patch with testcases for trunk, but should work on branches, too (after removing @Override). Without the fixes in HitQueue or TSDC the tests fail.

asfimport commented 14 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

its not a bug, as its doc'ed to work this way.

OK, so it was a design bug too.

asfimport commented 14 years ago

Robert Muir (@rmuir) (migrated from JIRA)

OK, so it was a design bug too.

A design bug that function queries score docs with an invalid score (NaN) instead of throwing an exception?

asfimport commented 14 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

A design bug that function queries score docs with an invalid score (NaN) instead of throwing an exception?

No, a design bug that -Inf scores were disallowed, esp since they were handled just fine in the past.

NaN is different - it's more of a judgement call depending on the cost to handle it.

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

The cost to handle NaN is the modified lessThan() in HitQueue.

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Improved test, that also checks for increasing doc ids when score identical

asfimport commented 14 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

I'm starting to think that handling NaN is as important as handling the infinities. This is because once you allow infinities, it's easy to get a NaN with a simple multiplication by 0.

asfimport commented 14 years ago

Robert Muir (@rmuir) (migrated from JIRA)

This is because once you allow infinities, it's easy to get a NaN with a simple multiplication by 0.

maybe we should keep it as is then, and only allow finite results

asfimport commented 14 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

maybe we should keep it as is then, and only allow finite results

But how? Finite results can combine and overflow into infinite results with things like boolean query, so we can't give a definite range of values that a query can safely produce.

asfimport commented 14 years ago

Robert Muir (@rmuir) (migrated from JIRA)

But how? Finite results can combine and overflow into infinite results with things like boolean query, so we can't give a definite range of values that a query can safely produce.

right, but this really only happens with function queries. this is my problem, with normal queries how can this happen? its a very very very very special case, and it seems overkill to add all these checks to performance-sensitive top-docs collection just for function queries, to handle NaN and infinites

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

In my opinion we should fix it using the attached patch and in the future 3.1 do some refactoring:

no sentinels
define a order for NaN, as NaN breaks the complete order of results (because PQ cannot handle the case that lessThan(a,b) returns false and also lessThan(b,a) when NaN is involved)

asfimport commented 14 years ago

Robert Muir (@rmuir) (migrated from JIRA)

In my opinion we should fix it using the attached patch and in the future 3.1 do some refactoring:

in general agree, on one condition that it doesnt slow down normal queries, MultiSearcher, or ParallelMultiSearcher significantly (benchmarks)

in my opinion the thing is documented not to work for these values, so its not a bug at all. if we want to support these values we should redesign the collector (in 3.1)

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Here a simplier patch with sentinels removed. You can maybe think about a better if-check in the out of order collector

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Sorry, insertWithOverflow is correct!

asfimport commented 14 years ago

Marvin Humphrey (migrated from JIRA)

An awful lot of thought went into optimizing those collection algorithms. I disagree with many of the design decisions that were made, but it seems rushed to blithely revert those optimizations.

FWIW, the SortCollector in KS (designed on the Lucy list last spring, would be in Lucy but some prereqs haven't gone in yet) doesn't have the problem with -Inf sentinels. It uses an array of "actions" representing sort rules to determine whether a hit is "competitive" and should be inserted into the queue; the first action is set to AUTO_ACCEPT (meaning try inserting the hit into the queue) until the queue fills up, and then again to AUTO_ACCEPT at the start of each segment. It's not necessary to fill up the queue with dummy hits beforehand.

static INLINE bool_t
SI_competitive(SortCollector *self, i32_t doc_id)
{
    u8_t *const actions = self->actions;
    u32_t i = 0;

    /* Iterate through our array of actions, returning as quickly as
     * possible. */
    do {
        switch (actions[i] & ACTIONS_MASK) {
            case AUTO_ACCEPT:
                return true;
            case AUTO_REJECT:
                return false;
            case AUTO_TIE:
                break;
            case COMPARE_BY_SCORE: {
                    float score = Matcher_Score(self->matcher);
                    if  (score > self->bubble_score) {
                        self->bumped->score = score;
                        return true;
                    }
                    else if (score < self->bubble_score) {
                        return false;
                    }
                }
                break;
            case COMPARE_BY_SCORE_REV: {
                // ...
            case COMPARE_BY_DOC_ID:
                // ...
            case COMPARE_BY_ORD1: {
                // ...
        }
    } while (++i < self->num_actions);

    /* If we've made it this far and we're still tied, reject the doc so that
     * we prefer items already in the queue.  This has the effect of
     * implicitly breaking ties by doc num, since docs are collected in order.
     */
    return false;
}

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Attached is the patch with the testcase from the previous one (with sentinels). All tests pass.

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I would rather not change the core default collector here, when sorting by score.

All of the patches being considered would add an extra comparison/and (or maybe if) per collect call, which while presumably small, is still an added cost.

Likely this added cost is in the noise of a benchmark so we wouldn't be able to conclude much by benching... but enough of these added costs will eventually add up. At the end of the day you are adding non-zero work for the CPU to do, for every collect.

This is Lucene's searching hotspot, so we should only be adding extra code on this critical path when it's really needed.

It's ashame to make everyone pay that price when in practice very few users need to collect -Inf/NaN scoring docs. This hasn't been hit in a "real" use case yet.

In fact, before 2.9 (ie up to and including 2.4), we by default filtered out all docs scoring <= 0.0, so -Inf and NaN were always filtered out, anyway.

Users who somehow do hit this in a real use case have a good workaround: use TopFieldCollector, sorting by Relevance. This will properly handle -Inf, and will at least collect NaN scoring docs (but their sort order will be random, as it always has). Or, use PositiveScoresOnlyCollector (to get back to Lucene 2.4 behavior). Or, create a custom collector.

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Here is a new impl that only has exactly one additional check in the initial collection (when th pq is not yet full). After the PQ is full, the collector is replaced by the short-cutting one.

This impl should even be faster than before, if the additional method call does not count and is removed by the JVM (which it should, because its clearly predictable)

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

more improved

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

More optimized version with more local variables. This is the version for the benchmark-try.

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Here a benchmark task made by grant. Run collector.alg and wait long enough.

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

More improved version, now equal to prefilled queue case, as the collector reuses overflowed ScoreDoc instances.

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I did some benchmarks (Java 1.5.0_22, 64bit, Win7, Core2Duo P8700) will do more tomorrow when i set up a large testing environment with 3 separate checkouts containing the three collector versions):

The latest approach (https://issues.apache.org/jira/secure/attachment/12436458/LUCENE-2271.patch) with no sentinels using the delegation and exchanging the inner collector was as fast as the original unpatched version
The approach with sentinels but fixed HitQueue ordering and extra checks (https://issues.apache.org/jira/secure/attachment/12436329/LUCENE-2271.patch), showed (as exspected) a little overhead: The ordered collector was as fast as the unpatched unordered collector (because one check more) - so i would not use this patch

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Fix an issue when numDocs==0.

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

This is a delightfully clever solution, delegating to a contained collector and then swapping in one collector for the startup (when the queue is not yet full) and then a new collector once the queue is full.

It's the closest equivalent we in javaland can reach, to using function pointers in C.

But, I don't think we should commit this – this makes the code more complex, and doesn't really gain enough (performance is the same) in return. It also relies more heavily on the JVM being able to optimize away the method call.

Yes it handles -inf/nan, but I don't think Lucene's default sort-by-relevance collector needs to (prior to 2.9 we silently filtered out such hits, as well as all hits with score <= 0.0).

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I think we should fix a few doc issues, and add asserts (attached patch for trunk).

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

After applying Mike's patch (with modified asserts to correctly detect NaN), updated my patch of the delegating and -inf/NaN aware TopScoreDocCollector.

Maybe we should add it as a separate collector for function queries in 3.1. Maybe with correct NaN ordering?

asfimport commented 11 years ago

Steven Rowe (@sarowe) (migrated from JIRA)

Bulk move 4.4 issues to 4.5 and 5.0

asfimport commented 10 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Move issue to Lucene 4.9.

apache / lucene

Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect results with TopScoreDocCollector [LUCENE-2271] #3347