BooleanQuery scores can be diff for same docs+sim when using coord (disagree with Explanation which doesn't change) [LUCENE-7132]

asfimport commented 8 years ago

Some of the folks reported that sometimes explain's score can be different than the score requested by fields parameter. Interestingly, Explain's scores would create a different ranking than the original result list. This is something users experience, but it cannot be re-produced deterministically.

Migrated from LUCENE-7132 by Ahmet Arslan (@iorixxx), resolved Feb 14 2017 Attachments: debug.xml, LUCENE-7132.patch (versions: 7), SOLR-8884.patch (versions: 2) Linked issues:

8193
- 8406

asfimport commented 8 years ago

Ahmet Arslan (@iorixxx) (migrated from JIRA)

There is the Rajesh's response file that demonstrates the problem.

asfimport commented 8 years ago

Alessandro Benedetti (@alessandrobenedetti) (migrated from JIRA)

In my case it happened when testing a re-ranking capability. The explain debug ranking was the correct one, with correct and expected scores. The results in the response were wrongly scored and ranked. I've never gone back to that and while I was testing, starting and restarting, the problem disappeared quite suddenly and I was not able to reproduce it. I know the information I added is almost null, hopefully we can get more evidence from other people !

Cheers

asfimport commented 8 years ago

Ahmet Arslan (@iorixxx) (migrated from JIRA)

Randomized test case for Lucene in hopes that it will trigger sometime. Will try to write Solr counterpart.

asfimport commented 8 years ago

Ahmet Arslan (@iorixxx) (migrated from JIRA)

This is truly a Lucene level bug. Attached path includes a failing test case. It can be reproduced with: ant test -Dtestcase=TestExplain -Dtests.method=testRajeshData -Dtests.seed=D5E55A7E84F4C82C -Dtests.slow=true -Dtests.locale=es-HN -Dtests.timezone=Asia/Samarkand -Dtests.asserts=true -Dtests.file.encoding=UTF-8

asfimport commented 8 years ago

Ahmet Arslan (@iorixxx) (migrated from JIRA)

Can someone who have the appropriate permissions please move SOLR-8884 to LUCENE-XXXX?

asfimport commented 8 years ago

Steven Rowe (@sarowe) (migrated from JIRA)

Done. Please adjust the issue title accordingly.

asfimport commented 8 years ago

Ahmet Arslan (@iorixxx) (migrated from JIRA)

Lucene only patch. Interestingly, testExplainScoreEquality method also failed once for me. Which can be reproduced with : ant test -Dtestcase=TestExplain -Dtests.method=testExplainScoreEquality -Dtests.seed=B90C674F754D524 -Dtests.locale=de -Dtests.timezone=Etc/GMT-12 -Dtests.asserts=true -Dtests.file.encoding=UTF-8 However, testRajeshData method fails more frequently.

asfimport commented 8 years ago

Ahmet Arslan (@iorixxx) (migrated from JIRA)

Thanks Steve for taking care of this!

asfimport commented 8 years ago

Ahmet Arslan (@iorixxx) (migrated from JIRA)

It is really hard to decipher what is going on inside the good old TFIDFSimilarity.

    `@Override`
    public void normalize(float queryNorm, float boost) {
      this.boost = boost;
      this.queryNorm = queryNorm;
      queryWeight = queryNorm * boost * idf.getValue();
      value = queryWeight * idf.getValue();         // idf for document
    }

Why query weight has a IDF multiplicand?
Why TFIDFSimilarity.IDFStats#value is set to IDF square?
Why TFIDFSimilarity.IDFStats#value is need even though we have TFIDFSimilarity.IDFStats.idf.getValue();
TFIDFSimilarity.TFIDFSimScorer#score returns tf(freq) * IDFStats.value which looks tfxIDFxIDF to me.

asfimport commented 8 years ago

Andreas Kostyrka (migrated from JIRA)

Just wanted to mention, it seems to affect SOLR 5.4.1 and 5.3.1 at least too.

asfimport commented 8 years ago

Chris M. Hostetter (@hossman) (migrated from JIRA)

W/o really having firm idea what's going on here i spent some time reviewing the tests that Ahmet posted, and then started poking at things with a stick – mainly to see if I could figure out the missing piece of the puzzle that caused things to fail in "testRajeshData()" but not in any of our existing randomized tests.

I feel down the rabit hole a bit looking into this, and I still have no concrete idea what the underlying problem is, but i have a few uneducated guesses...

the problem seems to only affect BooleanQueries where a "coord" factor is involved
- adding setDisableCoord(true) on the queries in testRajeshData seems to make all seeds pass
- so obviously reproducing this bug requires a Similarity that has a non-trivila coord function (like ClassicSimilaity in Ahmet's TestExplain)
the score values are inconsistent between randomized runs of testRajeshData, while the explain values stay the same
- even though the test data & similarity are fixed, testRajeshData pass fine with some seeds, while others fail
- when testRajeshData does fail, it's because the scores have changed compared to the values produced in previous passing runs
- best guess: something about the MergePolicy and the way docs are co-located in diff segments is triggering a pathological code path in some BooleanWeight/Scorer optimization code?
- perhaps because a segment may not even contain one of the optional terms, so it's Scorer is null?
- if it's not the MergePolicy, then perhaps something about the codec used and the term stats?

Things i've added/revised in this new version of the patch...

added some more asserts to TestExplain.testRajeshData to help demonstrate that it's the score that has changed between successful vs failing runs
fixed a bug in TestExplain.testExplainScoreEquality that was causing false failures
- randomSimpleString(random()) can produce the empty string, which was causing the test to not match the numDocs it was expecting since a whitespace based analyzer is being used
- NOTE: these false failures where the only case i've ever seen TestExplain.testExplainScoreEquality fail
added a TestBaseExplanationTestCase to the test-framework
- I had some initial concerns that maybe some old changes/refactorings to BaseExplanationTestCase had completely eliminated the checks that were suppose to be done in that test, so i added this class to ensure it would fail as expected when an Explanation didn't match scores
added more coord randomization in TestSimpleExplanations
- even w/these changes, i've never seen these tests fail
added a new TestSimpleExplanationsWithHeavyIDF
- this was one of the first things i tried, based on Ahmet's concerns about IDFStats.normalize
- since TestExplain.testRajeshData was failing so consistently, but we've never seen failures like this from any of of the existing BaseExplanationTestCase, i speculated that maybe IDF stats or index size were a big factor, and tried to bang out a subclass of TestSimpleExplanations that added a lot of (ignored) docs with the same terms to make the doc freq stats more interesting for the terms being searched on
- i've never seen this test fail
- some refactoring in BaseExplanationTestCase was needed to write this test

As things stand now with the current patch, here are some test seeds for TestExplain that pass for me...

ant test  -Dtestcase=TestExplain -Dtests.method=testRajeshData -Dtests.seed=11679FC4F397A674 -Dtests.slow=true -Dtests.locale=th -Dtests.timezone=Etc/GMT+1 -Dtests.asserts=true -Dtests.file.encoding=UTF-8 -Dtests.verbose=true
...
   [junit4]   2> NOTE: test params are: codec=Asserting(Lucene60): {id=PostingsFormat(name=SimpleText), body=PostingsFormat(name=Memory doPackFST= false)}, docValues:{}, maxPointsInLeafNode=604, maxMBSortInHeap=6.961830329642295, sim=ClassicSimilarity, locale=th, timezone=Etc/GMT+1

ant test -Dtestcase=TestExplain -Dtests.verbose=true -Dtests.seed=610EF558241D1898 -Dtests.slow=true -Dtests.locale=th -Dtests.timezone=Etc/GMT+1 -Dtests.asserts=true -Dtests.file.encoding=UTF-8 -Dtests.verbose=true
...
   [junit4]   2> NOTE: test params are: codec=Asserting(Lucene60): {id=PostingsFormat(name=Memory doPackFST= true), body=Lucene50(blocksize=128)}, docValues:{}, maxPointsInLeafNode=738, maxMBSortInHeap=5.910307356558968, sim=RandomSimilarity(queryNorm=true,coord=no): {}, locale=th, timezone=Etc/GMT+1

And here are some examples of failures...

fails...

ant test  -Dtestcase=TestExplain -Dtests.method=testRajeshData -Dtests.seed=A2E2DCFC7459241E -Dtests.slow=true -Dtests.locale=ar-EG -Dtests.timezone=Africa/Asmara -Dtests.asserts=true -Dtests.file.encoding=US-ASCII -Dtests.verbose=true

   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestExplain -Dtests.method=testRajeshData -Dtests.seed=A2E2DCFC7459241E -Dtests.slow=true -Dtests.locale=ar-EG -Dtests.timezone=Africa/Asmara -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
   [junit4] FAILURE 1.09s | TestExplain.testRajeshData <<<
   [junit4]    > Throwable #1: java.lang.AssertionError: o-365 score expected:<0.7475659251213074> but was:<0.2491886466741562>
   [junit4]    >    at __randomizedtesting.SeedInfo.seed([A2E2DCFC7459241E:C3C4FAF539C031CF]:0)
   [junit4]    >    at org.apache.lucene.search.TestExplain.testRajeshData(TestExplain.java:210)
   [junit4]    >    at java.lang.Thread.run(Thread.java:745)
   [junit4]   2> NOTE: test params are: codec=CheapBastard, sim=RandomSimilarity(queryNorm=false,coord=no): {}, locale=ar-EG, timezone=Africa/Asmara
   [junit4]   2> NOTE: Linux 3.19.0-51-generic amd64/Oracle Corporation 1.8.0_74 (64-bit)/cpus=4,threads=1,free=236148720,total=248512512

ant test  -Dtestcase=TestExplain -Dtests.method=testRajeshData -Dtests.seed=92DC100B28B70C2C -Dtests.slow=true -Dtests.locale=he -Dtests.timezone=America/Guyana -Dtests.asserts=true -Dtests.file.encoding=UTF-8  -Dtests.verbose=true

   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestExplain -Dtests.method=testRajeshData -Dtests.seed=92DC100B28B70C2C -Dtests.slow=true -Dtests.locale=he -Dtests.timezone=America/Guyana -Dtests.asserts=true -Dtests.file.encoding=UTF-8
   [junit4] FAILURE 1.03s | TestExplain.testRajeshData <<<
   [junit4]    > Throwable #1: java.lang.AssertionError: m-o-365 score expected:<1.2357021570205688> but was:<0.41190072894096375>
   [junit4]    >    at __randomizedtesting.SeedInfo.seed([92DC100B28B70C2C:F3FA3602652E19FD]:0)
   [junit4]    >    at org.apache.lucene.search.TestExplain.testRajeshData(TestExplain.java:207)
   [junit4]    >    at java.lang.Thread.run(Thread.java:745)
   [junit4]   2> NOTE: test params are: codec=Lucene60, sim=ClassicSimilarity, locale=he, timezone=America/Guyana
   [junit4]   2> NOTE: Linux 3.19.0-51-generic amd64/Oracle Corporation 1.8.0_74 (64-bit)/cpus=4,threads=1,free=173898752,total=249561088
   [junit4]   2> NOTE: All tests run in this JVM: [TestExplain]

My refactoring of BaseExplanationTestCase also seems to have somehow introduced/tickled a stranage bug in the test framework that trips our SecurityManager settings – note that the seed below causes an odd AccessControlException at the suite level after all TestSimpleExplanations test methods have finished successfully...

ant test  -Dtestcase=TestSimpleExplanations -Dtests.seed=74B719CE50C8168A -Dtests.slow=true -Dtests.locale=sr-Latn -Dtests.timezone=America/St_Vincent -Dtests.asserts=true -Dtests.file.encoding=UTF-8

   ...
   [junit4] OK      0.01s | TestSimpleExplanations.testBQ14
   [junit4] OK      0.01s | TestSimpleExplanations.testBQ1
   [junit4] OK      0.03s | TestSimpleExplanations.testBQ20
   [junit4] OK      0.03s | TestSimpleExplanations.testBQ4
   [junit4] OK      0.01s | TestSimpleExplanations.testP2
   [junit4]   2> NOTE: test params are: codec=Asserting(Lucene60), sim=RandomSimilarity(queryNorm=false,coord=no): {field=DFR I(n)B2, alt=DFR I(ne)B1}, locale=sr-Latn, timezone=America/St_Vincent
   [junit4]   2> NOTE: Linux 3.19.0-51-generic amd64/Oracle Corporation 1.8.0_74 (64-bit)/cpus=4,threads=1,free=303553832,total=329777152
   [junit4]   2> NOTE: All tests run in this JVM: [TestSimpleExplanations]
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestSimpleExplanations -Dtests.seed=74B719CE50C8168A -Dtests.slow=true -Dtests.locale=sr-Latn -Dtests.timezone=America/St_Vincent -Dtests.asserts=true -Dtests.file.encoding=UTF-8
   [junit4] ERROR   0.00s | TestSimpleExplanations (suite) <<<
   [junit4]    > Throwable #1: java.security.AccessControlException: access denied ("java.lang.RuntimePermission" "accessClassInPackage.sun.nio.fs")
   [junit4]    >    at __randomizedtesting.SeedInfo.seed([74B719CE50C8168A]:0)
   [junit4]    >    at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
   [junit4]    >    at java.security.AccessController.checkPermission(AccessController.java:884)
   [junit4]    >    at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
   [junit4]    >    at java.lang.SecurityManager.checkPackageAccess(SecurityManager.java:1564)
   [junit4]    >    at java.lang.Class.checkPackageAccess(Class.java:2372)
   [junit4]    >    at java.lang.Class.checkMemberAccess(Class.java:2351)
   [junit4]    >    at java.lang.Class.getDeclaredFields(Class.java:1915)
   [junit4]    >    at java.security.AccessController.doPrivileged(Native Method)
   [junit4]    >    at java.lang.Thread.run(Thread.java:745)
   [junit4] Completed [1/1 (1!)] in 2.50s, 68 tests, 1 error <<< FAILURES!

asfimport commented 8 years ago

Chris M. Hostetter (@hossman) (migrated from JIRA)

@mikemccand, @jpountz - any theory as to what could cause diff scores to be produced between diff test runs even though the Query, Documents & Similarity are fixed in Ahmet's TestExplain.testRajeshData?

@dweiss, @uschindler - any idea what i did to causes this weird AccessControlException in TestSimpleExplanations?

asfimport commented 8 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Dawid Weiss, Uwe Schindler - any idea what i did to causes this weird AccessControlException in TestSimpleExplanations?

The problem is caused by the static leak detector (same issue like the fix I posted to Solr). The leak detetctor measures static fields and their heap usage. Unfortunately this fails for non-public classes (e.g. implementations of Path). So be sure to null all static fields on afterClass().

In Java 9 this fails more often, so be sure to always null out static non-final fields that point to "more complex" and internal Java objects.

asfimport commented 8 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Here is what's wrong in the test: You have to null the writer field in BaseExplanationTestCase like the searcher in afterClass().

asfimport commented 8 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I added https://github.com/randomizedtesting/randomizedtesting/issues/227 to make this easier to debug. The stack trace is not very helpful to figure out that the "bad" guy was the non-nulled writer field.

asfimport commented 8 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

With #8229 committed (update of randomizedtesting framework), your test produces the following failure:

   [junit4]   2> NOTE: test params are: codec=Asserting(Lucene60), sim=RandomSimilarity(queryNorm=false,coord=no): {field=DFR I(n)B2, alt=DFR I(ne)B1}, locale=sr-Latn, timezone=America/St_Vincent
   [junit4]   2> NOTE: Windows 7 6.1 amd64/Oracle Corporation 1.8.0_72 (64-bit)/cpus=8,threads=1,free=166881504,total=261095424
   [junit4]   2> NOTE: All tests run in this JVM: [TestSimpleExplanations]
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestSimpleExplanations -Dtests.seed=74B719CE50C8168A -Dtests.slow=true -Dtests.locale=sr-Latn -Dtests.timezone=America/St_Vincent -Dtests.asserts=true -Dtests.file.encoding=UTF-8
   [junit4] ERROR   0.00s | TestSimpleExplanations (suite) <<<
   [junit4]    > Throwable #1: junit.framework.AssertionFailedError: Clean up static fields (in `@AfterClass`?) and null them, your test still has references to classes of which the sizes cannot be measured due to security restrictions or Java 9 module encapsulation:
   [junit4]    >   - protected static org.apache.lucene.index.RandomIndexWriter org.apache.lucene.search.BaseExplanationTestCase.writer
   [junit4]    >    at __randomizedtesting.SeedInfo.seed([74B719CE50C8168A]:0)
   [junit4]    >    at java.lang.Thread.run(Thread.java:745)
   [junit4]    > Caused by: java.security.AccessControlException: access denied ("java.lang.RuntimePermission" "accessClassInPackage.sun.nio.fs")
   [junit4]    >    at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
   [junit4]    >    at java.security.AccessController.checkPermission(AccessController.java:884)
   [junit4]    >    at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
   [junit4]    >    at java.lang.SecurityManager.checkPackageAccess(SecurityManager.java:1564)
   [junit4]    >    at java.lang.Class.checkPackageAccess(Class.java:2372)
   [junit4]    >    at java.lang.Class.checkMemberAccess(Class.java:2351)
   [junit4]    >    at java.lang.Class.getDeclaredFields(Class.java:1915)
   [junit4]    >    at java.security.AccessController.doPrivileged(Native Method)
   [junit4]    >    ... 10 more
   [junit4] Completed [1/1 (1!)] in 2.90s, 69 tests, 1 failure <<< FAILURES!

asfimport commented 8 years ago

Chris M. Hostetter (@hossman) (migrated from JIRA)

I was reminded of this issue today and pinged @mikemccand about it on IRC.

In reviewing the issue myself, to try and describe it to him succinctly, I realized the key bullet points of this bug is:

It is possible to build an index, with no deleted documents such that a particular BooleanQuery returns drastically different scores for some documents depending on if/how the segments in the index have been merged.
The Explanations for every document matching this BooleanQuery do not change based on if/how the segments in the index have been merged – such that the Explanations can be drastically different then the scores

Couched that way, and with a request from mike to try and confirm if the bug was dependent on using LuceneTestCase.newIndexWriterConfig (i gather he suspected it might be a test only bug) I revamped the previous test class to attempt to more straight forwardly demonstrate the crux of the matter.

There are now two test methods: testScoresWithDefaultIWC & testScoresWithRandomIWC – both delegating to the same helpe method for the meat of the test, just using different IndexWriterConfigs.

"checkScores" is the meat of both test methods. It builds up an index using the same "RajeshData.txt" and then executes a fixed query against the index, recording the scores of every mathcing document. It then does a forceMerge(1,true) on the IndexWriter, and re-executes the query comparing the scores of every matching document against the scores from the previous query execution.

If the tests make it this far (and they rarely do) then does the query again, this time checking the "Explanation" for every matching document against the score.

With this seed, both tests demonstrate an identical inconsistency between the scores of a document containing body="Microsoft Office 365" between the pre-merge and single-segment indexes....

   [junit4] <JUnit4> says ahoj! Master seed: B6DEF72C383813DE
   [junit4] Executing 1 suite with 1 JVM.
   [junit4] 
   [junit4] Started J0 PID(29777@localhost).
   [junit4] Suite: org.apache.lucene.search.TestBooleanScoreConsistency
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestBooleanScoreConsistency -Dtests.method=testScoresWithRandomIWC -Dtests.seed=B6DEF72C383813DE -Dtests.slow=true -Dtests.locale=ar-IQ -Dtests.timezone=Asia/Jayapura -Dtests.asserts=true -Dtests.file.encoding=UTF-8
   [junit4] FAILURE 2.80s | TestBooleanScoreConsistency.testScoresWithRandomIWC <<<
   [junit4]    > Throwable #1: java.lang.AssertionError: 7614: score doesn't match (previously returned) expected score for Microsoft Office 365 expected:<1.2357021570205688> but was:<0.41190072894096375>
   [junit4]    >    at __randomizedtesting.SeedInfo.seed([B6DEF72C383813DE:2C10E99E7CDA36F8]:0)
   [junit4]    >    at org.apache.lucene.search.TestBooleanScoreConsistency.checkScoresAgainstExpected(TestBooleanScoreConsistency.java:178)
   [junit4]    >    at org.apache.lucene.search.TestBooleanScoreConsistency.checkScores(TestBooleanScoreConsistency.java:137)
   [junit4]    >    at org.apache.lucene.search.TestBooleanScoreConsistency.testScoresWithRandomIWC(TestBooleanScoreConsistency.java:85)
   [junit4]    >    at java.lang.Thread.run(Thread.java:745)
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestBooleanScoreConsistency -Dtests.method=testScoresWithDefaultIWC -Dtests.seed=B6DEF72C383813DE -Dtests.slow=true -Dtests.locale=ar-IQ -Dtests.timezone=Asia/Jayapura -Dtests.asserts=true -Dtests.file.encoding=UTF-8
   [junit4] FAILURE 1.63s | TestBooleanScoreConsistency.testScoresWithDefaultIWC <<<
   [junit4]    > Throwable #1: java.lang.AssertionError: 7614: score doesn't match (previously returned) expected score for Microsoft Office 365 expected:<1.2357021570205688> but was:<0.41190072894096375>
   [junit4]    >    at __randomizedtesting.SeedInfo.seed([B6DEF72C383813DE:1CF87A1A7D4B3415]:0)
   [junit4]    >    at org.apache.lucene.search.TestBooleanScoreConsistency.checkScoresAgainstExpected(TestBooleanScoreConsistency.java:178)
   [junit4]    >    at org.apache.lucene.search.TestBooleanScoreConsistency.checkScores(TestBooleanScoreConsistency.java:137)
   [junit4]    >    at org.apache.lucene.search.TestBooleanScoreConsistency.testScoresWithDefaultIWC(TestBooleanScoreConsistency.java:79)
   [junit4]    >    at java.lang.Thread.run(Thread.java:745)
   [junit4]   2> NOTE: test params are: codec=Asserting(Lucene62): {id=PostingsFormat(name=Direct), body=Lucene50(blocksize=128)}, docValues:{}, maxPointsInLeafNode=825, maxMBSortInHeap=5.5977062021908015, sim=RandomSimilarity(queryNorm=false,coord=crazy): {}, locale=ar-IQ, timezone=Asia/Jayapura
   [junit4]   2> NOTE: Linux 3.19.0-51-generic amd64/Oracle Corporation 1.8.0_74 (64-bit)/cpus=4,threads=1,free=266511232,total=336068608
   [junit4]   2> NOTE: All tests run in this JVM: [TestBooleanScoreConsistency]
   [junit4] Completed [1/1 (1!)] in 4.72s, 2 tests, 2 failures <<< FAILURES!
   [junit4] 
   [junit4] 
   [junit4] Tests with failures [seed: B6DEF72C383813DE]:
   [junit4]   - org.apache.lucene.search.TestBooleanScoreConsistency.testScoresWithRandomIWC
   [junit4]   - org.apache.lucene.search.TestBooleanScoreConsistency.testScoresWithDefaultIWC
   [junit4] 
   [junit4] 
   [junit4] JVM J0:     0.40 ..     5.70 =     5.30s
   [junit4] Execution time total: 5.77 sec.
   [junit4] Tests summary: 1 suite, 2 tests, 2 failures

With this seed, the pre-merge and single-segment indexes in both tests return identical scores for all matching docs, but the Explanations (which seem to be correct AFAICT) don't match the scores...

   [junit4] <JUnit4> says salut! Master seed: AEB2F91E0E8938CC
   [junit4] Executing 1 suite with 1 JVM.
   [junit4] 
   [junit4] Started J0 PID(29716@localhost).
   [junit4] Suite: org.apache.lucene.search.TestBooleanScoreConsistency
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestBooleanScoreConsistency -Dtests.method=testScoresWithDefaultIWC -Dtests.seed=AEB2F91E0E8938CC -Dtests.slow=true -Dtests.locale=zh -Dtests.timezone=Australia/Sydney -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
   [junit4] FAILURE 4.08s | TestBooleanScoreConsistency.testScoresWithDefaultIWC <<<
   [junit4]    > Throwable #1: java.lang.AssertionError: 15581: score() and explain() return different values for: Office 365
   [junit4]    > 0.7475659 = product of:
   [junit4]    >   2.2426977 = sum of:
   [junit4]    >     2.2426977 = weight(body:365 in 15581) [ClassicSimilarity], result of:
   [junit4]    >       2.2426977 = score(doc=15581,freq=1.0), product of:
   [junit4]    >         0.5300819 = queryWeight, product of:
   [junit4]    >           6.769362 = idf(docFreq=53, docCount=17297)
   [junit4]    >           0.07830604 = queryNorm
   [junit4]    >         4.230851 = fieldWeight in 15581, product of:
   [junit4]    >           1.0 = tf(freq=1.0), with freq of:
   [junit4]    >             1.0 = termFreq=1.0
   [junit4]    >           6.769362 = idf(docFreq=53, docCount=17297)
   [junit4]    >           0.625 = fieldNorm(doc=15581)
   [junit4]    >   0.33333334 = coord(1/3)
   [junit4]    >  expected:<0.2491886466741562> but was:<0.7475659251213074>
   [junit4]    >    at __randomizedtesting.SeedInfo.seed([AEB2F91E0E8938CC:49474284BFA1F07]:0)
   [junit4]    >    at org.apache.lucene.search.TestBooleanScoreConsistency.checkScoresAgainstExplanations(TestBooleanScoreConsistency.java:214)
   [junit4]    >    at org.apache.lucene.search.TestBooleanScoreConsistency.checkScores(TestBooleanScoreConsistency.java:138)
   [junit4]    >    at org.apache.lucene.search.TestBooleanScoreConsistency.testScoresWithDefaultIWC(TestBooleanScoreConsistency.java:79)
   [junit4]    >    at java.lang.Thread.run(Thread.java:745)
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestBooleanScoreConsistency -Dtests.method=testScoresWithRandomIWC -Dtests.seed=AEB2F91E0E8938CC -Dtests.slow=true -Dtests.locale=zh -Dtests.timezone=Australia/Sydney -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
   [junit4] FAILURE 1.90s | TestBooleanScoreConsistency.testScoresWithRandomIWC <<<
   [junit4]    > Throwable #1: java.lang.AssertionError: 7614: score() and explain() return different values for: Microsoft Office 365
   [junit4]    > 1.2357022 = product of:
   [junit4]    >   1.8535532 = sum of:
   [junit4]    >     0.05939492 = weight(body:microsoft in 7614) [ClassicSimilarity], result of:
   [junit4]    >       0.05939492 = score(doc=7614,freq=1.0), product of:
   [junit4]    >         0.09644668 = queryWeight, product of:
   [junit4]    >           1.2316633 = idf(docFreq=13720, docCount=17297)
   [junit4]    >           0.07830604 = queryNorm
   [junit4]    >         0.6158317 = fieldWeight in 7614, product of:
   [junit4]    >           1.0 = tf(freq=1.0), with freq of:
   [junit4]    >             1.0 = termFreq=1.0
   [junit4]    >           1.2316633 = idf(docFreq=13720, docCount=17297)
   [junit4]    >           0.5 = fieldNorm(doc=7614)
   [junit4]    >     1.7941582 = weight(body:365 in 7614) [ClassicSimilarity], result of:
   [junit4]    >       1.7941582 = score(doc=7614,freq=1.0), product of:
   [junit4]    >         0.5300819 = queryWeight, product of:
   [junit4]    >           6.769362 = idf(docFreq=53, docCount=17297)
   [junit4]    >           0.07830604 = queryNorm
   [junit4]    >         3.384681 = fieldWeight in 7614, product of:
   [junit4]    >           1.0 = tf(freq=1.0), with freq of:
   [junit4]    >             1.0 = termFreq=1.0
   [junit4]    >           6.769362 = idf(docFreq=53, docCount=17297)
   [junit4]    >           0.5 = fieldNorm(doc=7614)
   [junit4]    >   0.6666667 = coord(2/3)
   [junit4]    >  expected:<0.41190072894096375> but was:<1.2357021570205688>
   [junit4]    >    at __randomizedtesting.SeedInfo.seed([AEB2F91E0E8938CC:347CE7AC4A6B1DEA]:0)
   [junit4]    >    at org.apache.lucene.search.TestBooleanScoreConsistency.checkScoresAgainstExplanations(TestBooleanScoreConsistency.java:214)
   [junit4]    >    at org.apache.lucene.search.TestBooleanScoreConsistency.checkScores(TestBooleanScoreConsistency.java:138)
   [junit4]    >    at org.apache.lucene.search.TestBooleanScoreConsistency.testScoresWithRandomIWC(TestBooleanScoreConsistency.java:85)
   [junit4]    >    at java.lang.Thread.run(Thread.java:745)
   [junit4]   2> NOTE: test params are: codec=Asserting(Lucene62): {id=PostingsFormat(name=MockRandom), body=PostingsFormat(name=LuceneFixedGap)}, docValues:{}, maxPointsInLeafNode=127, maxMBSortInHeap=5.70855157410318, sim=RandomSimilarity(queryNorm=true,coord=crazy): {}, locale=zh, timezone=Australia/Sydney
   [junit4]   2> NOTE: Linux 3.19.0-51-generic amd64/Oracle Corporation 1.8.0_74 (64-bit)/cpus=4,threads=1,free=181104832,total=342360064
   [junit4]   2> NOTE: All tests run in this JVM: [TestBooleanScoreConsistency]
   [junit4] Completed [1/1 (1!)] in 6.29s, 2 tests, 2 failures <<< FAILURES!
   [junit4] 
   [junit4] 
   [junit4] Tests with failures [seed: AEB2F91E0E8938CC]:
   [junit4]   - org.apache.lucene.search.TestBooleanScoreConsistency.testScoresWithDefaultIWC
   [junit4]   - org.apache.lucene.search.TestBooleanScoreConsistency.testScoresWithRandomIWC
   [junit4] 
   [junit4] 
   [junit4] JVM J0:     0.41 ..     7.13 =     6.73s
   [junit4] Execution time total: 7.18 sec.
   [junit4] Tests summary: 1 suite, 2 tests, 2 failures

With this perplexing seed (found using tests.iters) the Randomized IndexWriterConfig manages to pass all checks in the test – getting scores consistent with the Explanation both before and after the forceMerge – but the default IndexWriterConfig still demonstrates the problem...

   [junit4] <JUnit4> says hallo! Master seed: 29D39C39B6047864:5BA9313A359A1831
   [junit4] Executing 1 suite with 1 JVM.
   [junit4] 
   [junit4] Started J0 PID(29867@localhost).
   [junit4] Suite: org.apache.lucene.search.TestBooleanScoreConsistency
   [junit4] OK      6.16s | TestBooleanScoreConsistency.testScoresWithRandomIWC
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestBooleanScoreConsistency -Dtests.method=testScoresWithDefaultIWC -Dtests.seed=29D39C39B6047864 -Dtests.slow=true -Dtests.locale=uk-UA -Dtests.timezone=Indian/Comoro -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
   [junit4] FAILURE 1.26s | TestBooleanScoreConsistency.testScoresWithDefaultIWC <<<
   [junit4]    > Throwable #1: java.lang.AssertionError: 7614: score doesn't match (previously returned) expected score for Microsoft Office 365 expected:<1.2357021570205688> but was:<0.41190072894096375>
   [junit4]    >    at __randomizedtesting.SeedInfo.seed([29D39C39B6047864:5BA9313A359A1831]:0)
   [junit4]    >    at org.apache.lucene.search.TestBooleanScoreConsistency.checkScoresAgainstExpected(TestBooleanScoreConsistency.java:178)
   [junit4]    >    at org.apache.lucene.search.TestBooleanScoreConsistency.checkScores(TestBooleanScoreConsistency.java:137)
   [junit4]    >    at org.apache.lucene.search.TestBooleanScoreConsistency.testScoresWithDefaultIWC(TestBooleanScoreConsistency.java:79)
   [junit4]    >    at java.lang.Thread.run(Thread.java:745)
   [junit4]   2> NOTE: test params are: codec=Asserting(Lucene62), sim=RandomSimilarity(queryNorm=false,coord=yes): {}, locale=uk-UA, timezone=Indian/Comoro
   [junit4]   2> NOTE: Linux 3.19.0-51-generic amd64/Oracle Corporation 1.8.0_74 (64-bit)/cpus=4,threads=1,free=285815104,total=350748672
   [junit4]   2> NOTE: All tests run in this JVM: [TestBooleanScoreConsistency]
   [junit4] Completed [1/1 (1!)] in 7.73s, 2 tests, 1 failure <<< FAILURES!
   [junit4] 
   [junit4] 
   [junit4] Tests with failures [seed: 29D39C39B6047864:5BA9313A359A1831]:
   [junit4]   - org.apache.lucene.search.TestBooleanScoreConsistency.testScoresWithDefaultIWC
   [junit4] 
   [junit4] 
   [junit4] JVM J0:     0.37 ..     8.58 =     8.21s
   [junit4] Execution time total: 8.62 sec.
   [junit4] Tests summary: 1 suite, 2 tests, 1 failure

asfimport commented 8 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Thanks @hossman I'll dig.

asfimport commented 8 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Hmm seems like something is wrong w/ BQ's bulk scorer, because if I take it out and just use the default:

`@@` -361,6 +363,9 `@@` final class BooleanWeight extends Weight {

   `@Override`
   public BulkScorer bulkScorer(LeafReaderContext context) throws IOException {
+    // nocommit
+    return super.bulkScorer(context);
+    /*
     final BulkScorer bulkScorer = booleanScorer(context);
     if (bulkScorer != null) {
       // bulk scoring is applicable, use it
`@@` -369,6 +374,7 `@@` final class BooleanWeight extends Weight {
       // use a Scorer-based impl (BS2)
       return super.bulkScorer(context);
     }
+    */
   }

   `@Override`

Then at least the first seed above passes...

asfimport commented 8 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

OK if I put back BQ's bulkScorer, but then use new IndexSearcher instead of newSearcher the first seed above gets further along: the scores agree before/after forceMerge, but the explanation is still incorrect...

asfimport commented 8 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

A bit more progress ... it looks like somehow BooleanScorer gets confused and is collecting a window with more than one sub-scorer matching, yet failed to tell the top collector to reset back to the fake scorer (it's still on a boosted scorer, I think from a prior window that had only one matching sub, incorrectly applying a 0.3333 boost).

asfimport commented 8 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

OK I found the bug ... it's sneaky :)

In BooleanScorer.scoreWindowSingleScorer, we try to "reset the scorer that should be used for the general case" by calling collector.setScorer(fakeScorer).

The problem is collector at that point is the singleClauseCollector not the original collector ... I think this attached patch fixes it.

asfimport commented 8 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

@rmuir pointed out that #8193 is likely the same bug ... I'll confirm.

asfimport commented 8 years ago

Chris M. Hostetter (@hossman) (migrated from JIRA)

thanks for digging mike!

the lsat patch you attached only included "modified" files, not the "added" files from previous patches, so it was missing the new tests. At first i though maybe that was intentional on your part, that once you had diagnosed the root cause you had modified an existing test to include some checks for this specific situation so we wouldn't need my the crufty TestBooleanScoreConsistency with it's hardcoded data that we probably can't commit as is anyway – but i couldn't find any test modifications you had made to reliably reproduce this problem.

I'm attaching a unified patch that inlcudes all the previous test code along with mike's fixes ... but obviously we still need to improve this test to not include this user data (see nocommit) ... i'll work on that.

asfimport commented 8 years ago

Chris M. Hostetter (@hossman) (migrated from JIRA)

10:21 <@hoss:#lucene-dev> mikemccand: ping?
...
10:44 <@mikemccand:#lucene-dev> hoss: here
10:45 <@hoss:#lucene-dev> oh yeah ... just writting up a jira response ... i think you generated your patch just using "git diff" so it missed 
                          the "new" test files?
10:45 <@hoss:#lucene-dev> i've got a unified patch i'm about to post, so we have both the fix andthe tests that reliably demonstrate the 
                          problem
10:45 <@mikemccand:#lucene-dev> oh yeah sorry i did!
10:45 <@mikemccand:#lucene-dev> ++ thanks
10:46 <@hoss:#lucene-dev> no worries ... what i really wanted to ping you about was writting a better test
10:46 <@hoss:#lucene-dev> right now that test shouldn't be committed as is -- data fro ma user i'm certain we don't have rights to
10:46 <@mikemccand:#lucene-dev> ahh yeah that should be fun :)
10:46 <@mikemccand:#lucene-dev> yeah i saw the comment about that ...
10:46 <@hoss:#lucene-dev> i'm wondering if you could give me some pointers on the hueristics that lead to this optimizatio, so i can try to 
                          write a tighter test case that hits it?
10:46 <@hoss:#lucene-dev> (to prevent a regression)
10:46 <@mikemccand:#lucene-dev> ok lemme try
10:47 <@mikemccand:#lucene-dev> right, we need a test
10:47 <@mikemccand:#lucene-dev> you need a 2 clause BQ
10:47 <@mikemccand:#lucene-dev> where a document with docID 0 -- 2047 matches only one term
10:47 <@mikemccand:#lucene-dev> and then another docID > 2047 matches two terms
10:47 <@mikemccand:#lucene-dev> in that case the 2nd document should get the wrong (disagrees w/ explain) score i think
10:48 <@hoss:#lucene-dev> docID 0 .. as in literally docID 0 in teh index? ... it was that magical?
10:48 <@mikemccand:#lucene-dev> yes!
10:48 <@mikemccand:#lucene-dev> BS1 scores in windows of 2048 documents
10:48 <@hoss:#lucene-dev> holy fuck that's an edge case
10:48 <@mikemccand:#lucene-dev> the bug is that if 1 window uses an "optimization" because only 1 clauses matches ...
10:48 <@mikemccand:#lucene-dev> then that optimization messes up the state
10:48 <@mikemccand:#lucene-dev> and subsequent windows get the wrong scores
10:48 <@mikemccand:#lucene-dev> yeah serious edge case :)
10:48 <@mikemccand:#lucene-dev> i'm glad you pushed on this :)
10:48 <@mikemccand:#lucene-dev> thanks
10:49 <@hoss:#lucene-dev> oh ... so like, doc ID 2049 matching, but no other doc matches until after 2048 * 2 would also hit this bug?
10:49 <@hoss:#lucene-dev> actually ... it sounds like any doc matching as long as it's the only doc in it's window, and then another doc in a 
                          alater window?
10:49 <@mikemccand:#lucene-dev> right!
10:50 <@mikemccand:#lucene-dev> (where that later window's doc had more than 1 clause matching)
10:50 <@hoss:#lucene-dev> ok .. so really, we just need more tests with lots of docs, so that we force matches across the windows ... because 
                          2048 is hardcoded, not somethign we can randomize to small values via LTC
10:50 <@mikemccand:#lucene-dev> yeah ..
10:51 <@hoss:#lucene-dev> hmmm... why did forceMerge change things then?
10:51 <@hoss:#lucene-dev> with no deletions wy did the windows change?
10:51 <@mikemccand:#lucene-dev> hmmm i'm not sure?
10:51 <@mikemccand:#lucene-dev> the forceMerge is crazy: the index already had one segment
10:51 <@mikemccand:#lucene-dev> at least for your first seed
10:52 <@mikemccand:#lucene-dev> yet forceMerge DID run, because CFS wanted to change
10:52 <@mikemccand:#lucene-dev> but this should not have altered the docID order
10:52 <@mikemccand:#lucene-dev> so yeah i can't explain why forceMerge "matters" here
10:53 <@hoss:#lucene-dev> and yet - if it wasn't for the forceMerge, the only indication of the bug would be that the Explanations don't match 
                          -- unless we hardcoded scores in a test, which is hard for randomized data
10:53 <@hoss:#lucene-dev> nee impossible
10:54 <@mikemccand:#lucene-dev> yes
10:57 <@hoss:#lucene-dev> weird.... maybe there's another factor to the optimization we need to consider? ... i'll let you ponder while i try 
                          to figure out a test bsaed on what we know :)
10:57 <@mikemccand:#lucene-dev> LOL ok
...
11:04 <@hoss:#lucene-dev> mikemccand: BTW, you mind if i transcribe this conv to jira so i don't lose it?
11:04 <@mikemccand:#lucene-dev> ++ great

asfimport commented 8 years ago

Chris M. Hostetter (@hossman) (migrated from JIRA)

Ok, based on mike's clues, here are changes made in this new patch that (w/o mikes fix) help repro the bug more often...

TestBoolean2

randomize usage of coord in all the queries tested

randomly injects (empty) filler docs into the index to force matches to exist in diff buckets

this required some changes to queriesTest to know how to compute the correct expDocNrs to pass to CheckHits

updated the "bigSearcher" assertions to also compare the results from multiple Scorers

i never saw this actaully make a diff in terms of test fail/success, likely because of how bigSearcher is really only tacking stuff on to the end of the index, but since test was already comparing multiple scorers for the main index to try and find situations where BulkScorer would return diff results from the default Scorer it seemed like a good idea to check here as well.

added a "singleSegmentSearcher" that has a copy of the main index that has been forceMerged to a single segment

updated queriesTest to also use CheckHits.checkHitsQuery to compare results from the main index with the singleSegmentSearcher

this was inspired by the fact that in previous testing forceMerge(1) was changing the scores of documents even though there were no deletions, and Mike couldn't explain that.

This currently causes failures in some cases, even w/mikes fix applied - see below

replaced TestSimpleExplanationsWithHeavyIDF with TestSimpleExplanationsWithFillerDocs

now the filler docs are injected in betwen existing docs

dpeending on a random boolean, all filler docs are either:

empty – so the queries from the super class tests aren't modified, just the expected docids

fill of terms used in the queries (to muck with IDF like in the previous patch) – so the queries from the super class are also wrapped to exclude these filler docs

BaseExplanationTestCase

removed some no-longer needed refactoring that was in early patch

added some randomization to wrap any query tested in a BooleanQuery with a "SHOULD" clause that matches nothing (to force more interesting coord cases)

removed TestBooleanScoreConsistency and RajeshData.txt since they are no longer needed to reproduce he underlying bug easily

(NOTE: i experimented with injecting empty filler docs in TestMinShouldMatch2 as well, but it slowd the test WAAAAY down – ie: close to 5 minutes on my machine. I'm guessing because of how it uses "SlowMinShouldMatchScorer" ... so i abandoned those changes. Might be worth considering adding those back if someone sees a way to prevent it from being so dang slow)

Except for the above note in red (about TestBoolean2 failing in some cases when comparing the hits+scores between the original index and a copy of that index merged down to a single segment) mike's fix resolved every failure I was able to generate with these test changes.

I suspect that either:

I'm making some mistaken assumption in the validity of comparing scores between indexes like that (i don't think i am based on mikes comments from IRC yesterday)
There is a bug in my changes related to creating a "singleSegmentSearcher" that can be compared to "searcher" (extremely possible)
There is still another, as yet undiagnosed, bug somwhere in BooleanQuery/BooleanScorer that causes discrepencies between otherwise equivilent indexes based on segment boundaries (seems plausible given mikes comments on IRC this morning that he couldn't think of any reason why the bug he identified would be affected by forceMerge)

Here's an example of a failing seed with the current patch – note that based on the stack trace, both searchers produced a list of results that match the expected docids in the expected order, but the scores (for at least the first docid matched) are not equivilent...

   [junit4] <JUnit4> says Привет! Master seed: 1205E6391E501C49
   [junit4] Executing 1 suite with 1 JVM.
   [junit4] 
   [junit4] Started J0 PID(28313@localhost).
   [junit4] Suite: org.apache.lucene.search.TestBoolean2
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestBoolean2 -Dtests.method=testQueries10 -Dtests.seed=1205E6391E501C49 -Dtests.slow=true -Dtests.locale=zh-HK -Dtests.timezone=Australia/ACT -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
   [junit4] FAILURE 0.04s | TestBoolean2.testQueries10 <<<
   [junit4]    > Throwable #1: junit.framework.AssertionFailedError: Hit 0, doc nrs 2 and 2
   [junit4]    > unequal       : 0.5625806
   [junit4]    >            and: 0.42193544
   [junit4]    > for query:+field:w3 +field:xx +field:w2 field:zz
   [junit4]    >    at __randomizedtesting.SeedInfo.seed([1205E6391E501C49:E53B440260B039AE]:0)
   [junit4]    >    at junit.framework.Assert.fail(Assert.java:50)
   [junit4]    >    at org.apache.lucene.search.CheckHits.checkEqual(CheckHits.java:223)
   [junit4]    >    at org.apache.lucene.search.CheckHits.checkHitsQuery(CheckHits.java:205)
   [junit4]    >    at org.apache.lucene.search.TestBoolean2.queriesTest(TestBoolean2.java:196)
   [junit4]    >    at org.apache.lucene.search.TestBoolean2.testQueries10(TestBoolean2.java:329)
   [junit4]    >    at java.lang.Thread.run(Thread.java:745)
   [junit4]   2> NOTE: test params are: codec=CheapBastard, sim=RandomSimilarity(queryNorm=true,coord=no): {field=DFR I(ne)LZ(0.3), field2=DFR I(n)2}, locale=zh-HK, timezone=Australia/ACT
   [junit4]   2> NOTE: Linux 3.19.0-51-generic amd64/Oracle Corporation 1.8.0_74 (64-bit)/cpus=4,threads=1,free=281959656,total=315097088
   [junit4]   2> NOTE: All tests run in this JVM: [TestBoolean2]
   [junit4] Completed [1/1 (1!)] in 1.36s, 1 test, 1 failure <<< FAILURES!

But not every seed fails, example...

   [junit4] <JUnit4> says שלום! Master seed: 153FC5D5F31DB0CE
   [junit4] Executing 1 suite with 1 JVM.
   [junit4] 
   [junit4] Started J0 PID(28077@localhost).
   [junit4] Suite: org.apache.lucene.search.TestBoolean2
   [junit4] OK      0.10s | TestBoolean2.testQueries09
   [junit4] OK      0.04s | TestBoolean2.testQueries05
   [junit4] OK      0.03s | TestBoolean2.testQueries08
   [junit4] OK      0.04s | TestBoolean2.testQueries06
   [junit4] OK      0.02s | TestBoolean2.testQueries02
   [junit4] OK      0.02s | TestBoolean2.testQueries10
   [junit4] OK      0.04s | TestBoolean2.testQueries03
   [junit4] OK      0.01s | TestBoolean2.testQueries01
   [junit4] OK      0.01s | TestBoolean2.testQueries04
   [junit4] OK      0.01s | TestBoolean2.testQueries07
   [junit4] OK      3.68s | TestBoolean2.testRandomQueries
   [junit4] Completed [1/1] in 29.83s, 11 tests

asfimport commented 8 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Thanks @hossman, I can reproduce that failure with your patch ... I'll dig. It is baffling :)

asfimport commented 8 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Hmm, testQueries10 changes the similarity of searcher temporarily, but fails to also change singleSegmentSearcher's similarity ... when I fix that, then this seed passes.

But then on beasting I see other new failures ... digging.

asfimport commented 8 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

OK I found another test bug: you have to use a LogMergePolicy when building the single segment searcher else the docIDs can be jumbled since otherwise merge policies are allowed (and, do!) to merge segments out of order.

Maybe this also explains the original "force merge changed results" issue? Except, on the one seed I was debugging before, the original index was a single segment, so no matter the merge policy, the docID order would not change ...

Here's Hoss's last patch with those two test bugs fixed ... seems to survive beasting of TestBoolean2 for a while, and Lucene's core tests passed once.

asfimport commented 8 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I think this bug is serious enough that we should be sure to get it into 6.1.0 ... I marked blocker.

asfimport commented 8 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I also confirmed that if I revert the fix in BooleanScorer.java that TestBoolean2 does fail on some seeds ... thanks @hossman!

asfimport commented 8 years ago

Adrien Grand (@jpountz) (migrated from JIRA)

Thanks @mikemccand and Hoss for digging this sneaky bug! +1 to the patch

I think this bug is serious enough that we should be sure to get it into 6.1.0 ... I marked blocker.

+1

asfimport commented 8 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Thanks @jpountz and @hossman, I'll commit the last patch.

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit c8570ed821654cdce5f92ae17d06a21f242524e2 in lucene-solr's branch refs/heads/master from Mike McCandless https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c8570ed

LUCENE-7132: BooleanQuery sometimes assigned the wrong score when ranges of documents had only one clause matching while other ranges had more than one clause matchng

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit 5dfaf0392fcd3b7e4b529dce0cd1035b766880a7 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5dfaf03

LUCENE-7132: BooleanQuery sometimes assigned the wrong score when ranges of documents had only one clause matching while other ranges had more than one clause matchng

asfimport commented 8 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Thanks everyone.

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit 77844e2591235bfc1944e901922f876c1d43c264 in lucene-solr's branch refs/heads/branch_5_5 from @sarowe https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=77844e2

LUCENE-7132: BooleanQuery sometimes assigned the wrong score when ranges of documents had only one clause matching while other ranges had more than one clause matchng

(Cherry-picked from commit 5dfaf0392fcd3b7e4b529dce0cd1035b766880a7)

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit 4f6bddefe3310e0361c9b57fd522781d82c89bb8 in lucene-solr's branch refs/heads/branch_5_5 from @sarowe https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4f6bdde

LUCENE-7132: Add 5.5.2 CHANGES entry

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit 707bcc9b3bdae7b2bb2b9a7d9e30e1aa348587cb in lucene-solr's branch refs/heads/branch_5x from @sarowe https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=707bcc9b

LUCENE-7132: BooleanQuery sometimes assigned the wrong score when ranges of documents had only one clause matching while other ranges had more than one clause matchng

(Cherry-picked from commit 5dfaf0392fcd3b7e4b529dce0cd1035b766880a7)

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit 9f513d5569db42fe10b6580e69a754b7aa05f596 in lucene-solr's branch refs/heads/branch_6_0 from Mike McCandless https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9f513d5

LUCENE-7132: BooleanQuery sometimes assigned the wrong score when ranges of documents had only one clause matching while other ranges had more than one clause matchng

asfimport commented 7 years ago

Adrien Grand (@jpountz) (migrated from JIRA)

Reopening to fix the fixVersion.

apache / lucene

BooleanQuery scores can be diff for same docs+sim when using coord (disagree with Explanation which doesn't change) [LUCENE-7132] #8187

8193

8406