apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.61k stars 1.02k forks source link

Speed up computation of BM25 scores [LUCENE-9071] #10113

Closed asfimport closed 4 years ago

asfimport commented 4 years ago

We changed the way BM25 scores are computed in #9045 in order to guarantee monotonicity of scores, but this translated to a small decrease of throughput, see annotation CC (October 2017) on Mike's nightly benchmarks. Even though the total number of score computations has decreased since we introduced block-max WAND, its relative cost is not negligible since we not only compute scores on collected documents, but also when decoding skip lists in order to compute the maximum score per block, or group of blocks.


Migrated from LUCENE-9071 by Adrien Grand (@jpountz), resolved Dec 09 2019

asfimport commented 4 years ago

Adrien Grand (@jpountz) (migrated from JIRA)

I'm getting a small but consistently reproducible speedup for boolean/term queries, so I believe it is not noise. Here is the output of one run on wikibigall for instance:

                    TaskQPS baseline      StdDev   QPS patch      StdDev                Pct diff
                SpanNear        1.37     (20.9%)        1.36     (20.7%)   -0.5% ( -34% -   51%)
                PKLookup      195.89      (3.5%)      195.36      (3.8%)   -0.3% (  -7% -    7%)
                 Prefix3       59.17      (6.7%)       59.13      (6.6%)   -0.1% ( -12% -   14%)
   HighTermDayOfYearSort       46.46      (5.8%)       46.44      (6.1%)   -0.1% ( -11% -   12%)
       HighTermMonthSort       65.80     (12.4%)       65.80     (12.3%)   -0.0% ( -22% -   28%)
                  Fuzzy1      160.52     (12.7%)      160.58     (13.2%)    0.0% ( -22% -   29%)
        IntervalsOrdered       10.76      (3.3%)       10.76      (3.2%)    0.1% (  -6% -    6%)
                Wildcard      101.27      (3.7%)      101.33      (4.2%)    0.1% (  -7% -    8%)
            SloppyPhrase        6.32      (7.2%)        6.33      (7.2%)    0.1% ( -13% -   15%)
                  Fuzzy2       80.13      (8.3%)       80.42      (9.2%)    0.4% ( -15% -   19%)
         AndHighOrMedMed       37.69      (2.2%)       37.90      (1.9%)    0.6% (  -3% -    4%)
                  Phrase       10.95      (2.5%)       11.03      (2.4%)    0.8% (  -4% -    5%)
        AndMedOrHighHigh       28.77      (2.7%)       29.14      (3.1%)    1.3% (  -4% -    7%)
                  IntNRQ       92.24      (3.1%)       94.13      (3.4%)    2.1% (  -4% -    8%)
              AndHighMed       51.84      (3.0%)       52.92      (3.3%)    2.1% (  -4% -    8%)
                    Term     1375.94      (2.3%)     1405.17      (3.1%)    2.1% (  -3% -    7%)
               OrHighMed       71.51      (2.3%)       73.32      (2.9%)    2.5% (  -2% -    7%)
              OrHighHigh       86.33      (2.2%)       89.26      (3.0%)    3.4% (  -1% -    8%)
             AndHighHigh       36.11      (2.9%)       37.34      (3.9%)    3.4% (  -3% -   10%)
asfimport commented 4 years ago

ASF subversion and git services (migrated from JIRA)

Commit c413656b627160d49eb9e9f1f84ec4945db80f0e in lucene-solr's branch refs/heads/master from Adrien Grand https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c413656

LUCENE-9071: Speed up BM25 scores. (#1043)

asfimport commented 4 years ago

ASF subversion and git services (migrated from JIRA)

Commit 6385e63851898f8fecc30f3f6fea614796ef6c8a in lucene-solr's branch refs/heads/branch_8x from Adrien Grand https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6385e63

LUCENE-9071: Speed up BM25 scores. (#1043)

asfimport commented 4 years ago

ASF subversion and git services (migrated from JIRA)

Commit c413656b627160d49eb9e9f1f84ec4945db80f0e in lucene-solr's branch refs/heads/gradle-master from Adrien Grand https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c413656

LUCENE-9071: Speed up BM25 scores. (#1043)

asfimport commented 4 years ago

Adrien Grand (@jpountz) (migrated from JIRA)

Closing after the 8.4.0 release.