Specialize 2-clauses disjunctions [LUCENE-10480]

asfimport commented 2 years ago

WANDScorer is nice, but it also has lots of overhead to maintain its invariants: one linked list for the current candidates, one priority queue of scorers that are behind, another one for scorers that are ahead. All this could be simplified in the 2-clauses case, which feels worth specializing for as it's very common that end users enter queries that only have two terms?

Migrated from LUCENE-10480 by Adrien Grand (@jpountz), resolved Jul 20 2022 Pull requests: https://github.com/apache/lucene/pull/101

asfimport commented 2 years ago

Zach Chen (@zacharymorn) (migrated from JIRA)

Hi @jpountz , this issue reminded me of our experiments last year implementing BMM scorer for pure disjunction, which showed about 20% \~ 40% improvement for OrHighHigh and OrHighMed queries . Do you think we should continue to explore in that direction, or there might be better / simpler algorithms we could look into?

asfimport commented 2 years ago

Adrien Grand (@jpountz) (migrated from JIRA)

Good question, looking at your BlockMaxMaxScoreScorer it looks like it also has potential for being specialized in the 2-clauses case by having two sub scorers and tracking during document collection whether the scorer that produces lower scores is optional or required. I didn't have concrete plans in mind when opening the issue, I was just observing that we pay significant overhead for supporting arbitrary numbers of clauses when disjunctions often have only two clauses.

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit 503ec5597331454bfbbbb8b6af79b9701cfdccf5 in lucene's branch refs/heads/main from zacharymorn https://gitbox.apache.org/repos/asf?p=lucene.git;h=503ec559733

LUCENE-10480: Use BMM scorer for 2 clauses disjunction (#972)

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit a5c99aca1abc9b73a0c68d4f23533311382b718c in lucene's branch refs/heads/branch_9x from zacharymorn https://gitbox.apache.org/repos/asf?p=lucene.git;h=a5c99aca1ab

LUCENE-10480: Use BMM scorer for 2 clauses disjunction (#972) (#1002)

(cherry picked from commit 503ec5597331454bfbbbb8b6af79b9701cfdccf5)

asfimport commented 2 years ago

Adrien Grand (@jpountz) (migrated from JIRA)

Nightly benchmarks picked up the change and top-level disjunctions are seeing massive speedups, see OrHighHigh or OrHighMed. However disjunctions within conjunctions got a slowdown, see AndHighOrMedMed or AndMedOrHighHigh.

asfimport commented 2 years ago

Adrien Grand (@jpountz) (migrated from JIRA)

Looking at this new scorer from the perspective of disjunctions within conjunctions, maybe there are bits from advance() that we could move to matches() so that we would hand it over to the other clause before we start doing expensive operations like computing scores. What do you think @zacharymorn?

asfimport commented 2 years ago

Zach Chen (@zacharymorn) (migrated from JIRA)

Nightly benchmarks picked up the change and top-level disjunctions are seeing massive speedups, see OrHighHigh or OrHighMed. However disjunctions within conjunctions got a slowdown, see AndHighOrMedMed or AndMedOrHighHigh.

The results look encouraging and interesting! I copied and pasted the boolean queries from wikinightly.tasks into

wikimedium.10M.nostopwords.tasks and ran the benchmark, and was able to re-produce the slow-down:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                 AndHighOrMedMed      108.16      (6.5%)      100.44      (5.4%)   -7.1% ( -17% -    5%) 0.000
                AndMedOrHighHigh       68.37      (4.5%)       63.92      (5.0%)   -6.5% ( -15% -    3%) 0.000
                     AndHighHigh      122.90      (5.5%)      122.77      (5.5%)   -0.1% ( -10% -   11%) 0.952
                      AndHighMed      113.27      (6.4%)      114.63      (6.2%)    1.2% ( -10% -   14%) 0.546
                        PKLookup      228.08     (14.4%)      232.90     (14.7%)    2.1% ( -23% -   36%) 0.646
                      OrHighHigh       26.89      (5.7%)       48.62     (12.2%)   80.8% (  59% -  104%) 0.000
                       OrHighMed       81.18      (5.9%)      187.05     (12.2%)  130.4% ( 105% -  157%) 0.000

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                AndMedOrHighHigh       85.67      (5.3%)       73.23      (5.7%)  -14.5% ( -24% -   -3%) 0.000
                        PKLookup      260.08     (13.4%)      253.74     (14.9%)   -2.4% ( -27% -   29%) 0.586
                     AndHighHigh       73.68      (4.7%)       72.70      (4.1%)   -1.3% (  -9% -    7%) 0.339
                      AndHighMed       89.52      (5.1%)       88.55      (4.4%)   -1.1% ( -10% -    8%) 0.470
                 AndHighOrMedMed       63.27      (6.5%)       70.48      (5.7%)   11.4% (   0% -   25%) 0.000
                      OrHighHigh       19.60      (5.3%)       25.62      (7.6%)   30.8% (  16% -   46%) 0.000
                       OrHighMed      121.08      (5.7%)      236.34     (10.2%)   95.2% (  74% -  117%) 0.000

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                AndMedOrHighHigh       86.88      (3.4%)       76.60      (3.1%)  -11.8% ( -17% -   -5%) 0.000
                     AndHighHigh       30.49      (3.5%)       30.36      (3.5%)   -0.4% (  -7% -    6%) 0.697
                      AndHighMed      192.76      (3.4%)      193.72      (3.9%)    0.5% (  -6% -    8%) 0.671
                        PKLookup      262.59      (5.5%)      264.52      (7.9%)    0.7% ( -11% -   14%) 0.731
                 AndHighOrMedMed       65.47      (3.8%)       73.43      (3.0%)   12.2% (   5% -   19%) 0.000
                      OrHighHigh       21.47      (4.1%)       36.94      (8.3%)   72.1% (  57% -   88%) 0.000
                       OrHighMed       99.91      (4.3%)      292.05     (12.9%)  192.3% ( 167% -  218%) 0.000

However, when I reduced the type of tasks further into just conjunction + disjunction (and with default number of search threads), the results actually turned positive and were similar to what I saw earlier in https://github.com/apache/lucene/pull/972#issuecomment-1166188875

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                 AndHighOrMedMed       58.65     (37.3%)       71.63     (28.9%)   22.1% ( -32% -  140%) 0.036
                AndMedOrHighHigh       36.43     (39.3%)       44.61     (30.7%)   22.4% ( -34% -  152%) 0.044
                        PKLookup      163.58     (34.4%)      211.88     (32.7%)   29.5% ( -27% -  147%) 0.005

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value                         PKLookup      146.51     (22.0%)      188.92     (30.1%)   28.9% ( -18% -  103%) 0.001                 AndMedOrHighHigh       35.59     (27.1%)       49.99     (37.5%)   40.4% ( -18% -  144%) 0.000                  AndHighOrMedMed       44.47     (26.6%)       63.37     (35.8%)   42.5% ( -15% -  142%) 0.000

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                AndMedOrHighHigh       35.29     (25.0%)       52.22     (33.5%)   47.9% (  -8% -  141%) 0.000
                        PKLookup      134.13     (23.6%)      204.43     (25.6%)   52.4% (   2% -  132%) 0.000
                 AndHighOrMedMed       45.96     (25.1%)       74.16     (34.8%)   61.4% (   1% -  161%) 0.000

If I were to run one task and one query per each benchmark run (there are only 5 queries for AndMedOrHighHigh in the nightly task) , the results are also positive:

AndMedOrHighHigh: +mostly +(are last) # freq=89401 freq=1921211 freq=830278

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                        PKLookup      149.26     (26.2%)      165.25     (31.6%)   10.7% ( -37% -   92%) 0.243
                AndMedOrHighHigh       25.53     (25.7%)       37.18     (42.5%)   45.6% ( -17% -  152%) 0.000

------
AndMedOrHighHigh: +interview +(at united) # freq=94736 freq=2834104 freq=1185528

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                        PKLookup      241.27     (14.6%)      266.37     (10.6%)   10.4% ( -12% -   41%) 0.010
                AndMedOrHighHigh       27.52     (32.7%)       51.02     (46.2%)   85.4% (   4% -  244%) 0.000
------
AndMedOrHighHigh: +hard +(but year) # freq=92045 freq=1484398 freq=1098425

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                        PKLookup      152.20     (15.0%)      161.92     (15.4%)    6.4% ( -20% -   43%) 0.185
                AndMedOrHighHigh       26.02     (35.0%)       38.02     (38.5%)   46.1% ( -20% -  184%) 0.000
-------
AndMedOrHighHigh: +9 +(name its) # freq=541405 freq=2577591 freq=1160703

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                        PKLookup      184.54     (32.6%)      208.37     (22.2%)   12.9% ( -31% -  100%) 0.143
                AndMedOrHighHigh       18.05     (31.2%)       24.33     (20.0%)   34.8% ( -12% -  125%) 0.000
-------
AndMedOrHighHigh: +bay +(to but) # freq=117167 freq=6105155 freq=1484398 

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                        PKLookup      164.67     (15.7%)      167.94     (22.2%)    2.0% ( -31% -   47%) 0.744
                AndMedOrHighHigh       25.20     (35.3%)       28.75     (43.6%)   14.1% ( -47% -  143%) 0.262

Maybe the caching effect is worth looking into as well?

maybe there are bits from advance() that we could move to matches() so that we would hand it over to the other clause before we start doing expensive operations like computing scores.

Yup let me give it a try and see if it changes the results.

asfimport commented 2 years ago

Zach Chen (@zacharymorn) (migrated from JIRA)

maybe there are bits from advance() that we could move to matches() so that we would hand it over to the other clause before we start doing expensive operations like computing scores.

This approach does help stabilizing performance for disjunction within conjunction queries (and also provide some small gains)! I have opened a PR for it https://github.com/apache/lucene/pull/1006 .

asfimport commented 2 years ago

Adrien Grand (@jpountz) (migrated from JIRA)

I still suspect that one issue when only running queries that are very good at dynamic pruning is that the JVM doesn't have time to warm up. These queries can figure out the top 10 hits by only evaluating a few thousands hits, so probably that parts of the logic still runs in interpreted mode. The fact that queries run slower when you run them in isolation further suggests that this is the problematic scenario, not the case when the benchmark includes multiple types of queries?

asfimport commented 2 years ago

Zach Chen (@zacharymorn) (migrated from JIRA)

Ok I see. Maybe I can also try to run some benchmark experiments with different JVM compilation / code cache parameters to further test things out. Will report back if I find something interesting!

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit da8143bfa38cd5fadae4b4712b9e639e79016021 in lucene's branch refs/heads/main from zacharymorn https://gitbox.apache.org/repos/asf?p=lucene.git;h=da8143bfa38

LUCENE-10480: Move scoring from advance to TwoPhaseIterator#matches to improve disjunction within conjunction (#1006)

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit 090cbc50dd7e5659494149f470378ab7f6a90cf1 in lucene's branch refs/heads/branch_9x from zacharymorn https://gitbox.apache.org/repos/asf?p=lucene.git;h=090cbc50dd7

LUCENE-10480: Move scoring from advance to TwoPhaseIterator#matches to improve disjunction within conjunction (#1006) (#1008)

(cherry picked from commit da8143bfa38cd5fadae4b4712b9e639e79016021)

asfimport commented 2 years ago

Adrien Grand (@jpountz) (migrated from JIRA)

AndMedOrHighHigh recovered fully but AndHighOrMedMed only a bit. I'm unsure what explains there is still a slowdown compared to BMW.

asfimport commented 2 years ago

Zach Chen (@zacharymorn) (migrated from JIRA)

AndMedOrHighHigh recovered fully but AndHighOrMedMed only a bit. I'm unsure what explains there is still a slowdown compared to BMW.

Hmm this is quite strange. Looks like AndHighOrMedMed was still having about -13% (5 / 38) impact. I just ran the full suite of wikinightly tasks a few times (by copying wikinightly.tasks into wikimedium.10M.nostopwords.tasks and running localrun.py with source wikimedium10m, and removing VectorSearch queries as they were causing failure NPE for me) but couldn't reproduce the slow down (baseline is using head before all BMM changes):

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
     BrowseRandomLabelSSDVFacets       20.83      (3.8%)       20.09      (6.5%)   -3.6% ( -13% -    6%) 0.034
           BrowseMonthSSDVFacets       30.36     (10.6%)       29.56     (12.7%)   -2.7% ( -23% -   23%) 0.473
                         Prefix3      402.70      (9.3%)      397.59      (9.9%)   -1.3% ( -18% -   19%) 0.674
               TermDayOfYearSort      183.55      (6.5%)      181.61      (6.9%)   -1.1% ( -13% -   13%) 0.617
                   TermTitleSort      195.99      (7.2%)      194.25      (8.1%)   -0.9% ( -15% -   15%) 0.713
                        PKLookup      293.80      (3.7%)      291.47      (4.8%)   -0.8% (  -8% -    7%) 0.555
                   TermMonthSort      283.86      (7.1%)      281.74      (8.0%)   -0.7% ( -14% -   15%) 0.755
                        Wildcard      227.26      (6.2%)      225.87      (6.4%)   -0.6% ( -12% -   12%) 0.759
                            Term     2227.50      (3.7%)     2219.57      (3.3%)   -0.4% (  -7% -    6%) 0.748
                          Fuzzy1      134.77      (2.8%)      134.37      (2.3%)   -0.3% (  -5% -    4%) 0.712
                    TermGroup100       53.61      (3.7%)       53.47      (4.6%)   -0.3% (  -8% -    8%) 0.846
                      TermDTSort      143.16      (3.2%)      142.89      (3.3%)   -0.2% (  -6% -    6%) 0.857
                  TermBGroup1M1P       79.44      (5.5%)       79.29      (5.5%)   -0.2% ( -10% -   11%) 0.917
        AndHighHighDayTaxoFacets       45.01      (2.3%)       44.94      (2.1%)   -0.1% (  -4% -    4%) 0.833
     BrowseRandomLabelTaxoFacets       30.94     (50.0%)       30.92     (46.8%)   -0.0% ( -64% -  193%) 0.998
         AndHighMedDayTaxoFacets       78.11      (3.2%)       78.11      (3.0%)   -0.0% (  -6% -    6%) 0.998
                          Phrase      202.17      (2.7%)      202.18      (2.0%)    0.0% (  -4% -    4%) 0.996
                          Fuzzy2       76.10      (2.6%)       76.15      (2.0%)    0.1% (  -4% -    4%) 0.933
                     TermGroup1M       22.65      (3.8%)       22.67      (3.2%)    0.1% (  -6% -    7%) 0.919
                  TermDateFacets       32.50      (5.3%)       32.60      (5.5%)    0.3% (  -9% -   11%) 0.861
       BrowseDayOfYearSSDVFacets       26.31      (5.9%)       26.39      (8.5%)    0.3% ( -13% -   15%) 0.897
                         Respell       88.21      (2.2%)       88.49      (2.1%)    0.3% (  -3% -    4%) 0.642
                        SpanNear       16.14      (4.0%)       16.22      (4.2%)    0.5% (  -7% -    9%) 0.706
            MedTermDayTaxoFacets       73.42      (4.8%)       73.85      (4.9%)    0.6% (  -8% -   10%) 0.708
                    TermBGroup1M       48.92      (4.2%)       49.23      (2.8%)    0.6% (  -6% -    8%) 0.581
                IntervalsOrdered       22.42      (5.8%)       22.59      (4.2%)    0.7% (  -8% -   11%) 0.651
          OrHighMedDayTaxoFacets       25.27      (6.1%)       25.46      (6.6%)    0.7% ( -11% -   14%) 0.711
                    TermGroup10K       30.26      (4.2%)       30.50      (2.9%)    0.8% (  -6% -    8%) 0.494
                    SloppyPhrase       91.40      (5.6%)       92.16      (6.3%)    0.8% ( -10% -   13%) 0.662
                          IntNRQ      152.74     (20.3%)      154.86     (17.1%)    1.4% ( -29% -   48%) 0.815
                      AndHighMed       88.55      (2.6%)       89.98      (3.1%)    1.6% (  -3% -    7%) 0.073
                     AndHighHigh       29.10      (2.7%)       29.68      (3.1%)    2.0% (  -3% -    8%) 0.032
       BrowseDayOfYearTaxoFacets       31.29     (40.0%)       31.93     (38.0%)    2.0% ( -54% -  133%) 0.869
            BrowseDateTaxoFacets       31.18     (40.3%)       31.87     (38.5%)    2.2% ( -54% -  135%) 0.859
            BrowseDateSSDVFacets        3.79     (28.4%)        3.92     (27.9%)    3.4% ( -41% -   83%) 0.700
                 AndHighOrMedMed       63.04      (6.1%)       65.68      (5.5%)    4.2% (  -7% -   16%) 0.023
                AndMedOrHighHigh       92.29      (4.6%)       99.20      (5.5%)    7.5% (  -2% -   18%) 0.000
           BrowseMonthTaxoFacets       30.93     (39.4%)       34.36     (43.4%)   11.1% ( -51% -  154%) 0.397
                      OrHighHigh       20.09      (6.5%)       33.58      (8.7%)   67.2% (  48% -   88%) 0.000
                       OrHighMed       78.61      (5.4%)      186.58     (10.7%)  137.4% ( 115% -  162%) 0.000

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                  TermBGroup1M1P       81.36      (7.8%)       78.61      (5.5%)   -3.4% ( -15% -   10%) 0.114
           BrowseMonthSSDVFacets       29.39     (13.1%)       28.50     (13.3%)   -3.0% ( -26% -   26%) 0.469
            BrowseDateSSDVFacets        3.91     (27.0%)        3.81     (27.5%)   -2.5% ( -44% -   71%) 0.768
                 AndHighOrMedMed      108.53      (6.6%)      106.50      (5.7%)   -1.9% ( -13% -   11%) 0.336
          OrHighMedDayTaxoFacets       23.14      (4.4%)       22.93      (6.5%)   -0.9% ( -11% -   10%) 0.596
                    TermGroup100       64.69      (4.4%)       64.13      (3.4%)   -0.9% (  -8% -    7%) 0.492
               TermDayOfYearSort      142.82      (5.3%)      141.72      (2.7%)   -0.8% (  -8% -    7%) 0.562
                    SloppyPhrase        3.10      (4.2%)        3.08      (4.6%)   -0.7% (  -9% -    8%) 0.629
                          Phrase       35.56      (2.5%)       35.36      (2.4%)   -0.6% (  -5% -    4%) 0.467
                        SpanNear       13.52      (3.7%)       13.45      (3.3%)   -0.5% (  -7% -    6%) 0.667
                         Prefix3      395.12      (9.1%)      393.74     (10.6%)   -0.3% ( -18% -   21%) 0.911
                   TermMonthSort      192.42      (9.5%)      191.95      (7.1%)   -0.2% ( -15% -   18%) 0.926
                            Term     3216.34      (3.6%)     3208.51      (3.8%)   -0.2% (  -7% -    7%) 0.833
                   TermTitleSort      278.44      (9.5%)      277.85      (7.1%)   -0.2% ( -15% -   18%) 0.936
                         Respell       89.07      (2.1%)       88.98      (2.6%)   -0.1% (  -4% -    4%) 0.885
                          Fuzzy1      127.07      (1.9%)      127.23      (2.8%)    0.1% (  -4% -    4%) 0.874
     BrowseRandomLabelSSDVFacets       20.41      (9.5%)       20.44      (8.6%)    0.2% ( -16% -   20%) 0.954
                        Wildcard      366.66      (6.1%)      367.33      (6.1%)    0.2% ( -11% -   13%) 0.925
                        PKLookup      291.94      (4.4%)      292.59      (2.9%)    0.2% (  -6% -    7%) 0.849
                          IntNRQ      351.10      (1.2%)      351.89      (1.1%)    0.2% (  -2% -    2%) 0.540
                    TermGroup10K       22.73      (3.5%)       22.81      (3.5%)    0.4% (  -6% -    7%) 0.731
                     AndHighHigh       49.25      (4.1%)       49.45      (4.6%)    0.4% (  -7% -    9%) 0.770
                          Fuzzy2      136.67      (2.0%)      137.33      (2.5%)    0.5% (  -3% -    5%) 0.497
            MedTermDayTaxoFacets       75.39      (3.4%)       75.79      (2.8%)    0.5% (  -5% -    7%) 0.591
         AndHighMedDayTaxoFacets      135.26      (2.6%)      136.01      (2.1%)    0.6% (  -3% -    5%) 0.449
        AndHighHighDayTaxoFacets       11.44      (2.4%)       11.50      (1.9%)    0.6% (  -3% -    4%) 0.386
                IntervalsOrdered       13.19      (2.6%)       13.27      (2.8%)    0.6% (  -4% -    6%) 0.456
                  TermDateFacets       32.59      (3.8%)       32.81      (3.1%)    0.7% (  -5% -    7%) 0.526
                      AndHighMed      109.27      (4.6%)      110.09      (5.6%)    0.7% (  -9% -   11%) 0.648
                AndMedOrHighHigh       67.43      (6.2%)       68.02      (6.2%)    0.9% ( -10% -   14%) 0.654
                     TermGroup1M       26.00      (2.9%)       26.26      (3.4%)    1.0% (  -5% -    7%) 0.310
       BrowseDayOfYearSSDVFacets       27.08     (10.0%)       27.36     (14.4%)    1.0% ( -21% -   28%) 0.792
                    TermBGroup1M       37.57      (3.2%)       38.00      (4.0%)    1.1% (  -5% -    8%) 0.318
                      TermDTSort      141.09      (2.6%)      143.58      (6.5%)    1.8% (  -7% -   11%) 0.259
           BrowseMonthTaxoFacets       28.19     (37.9%)       29.70     (40.8%)    5.3% ( -53% -  135%) 0.669
       BrowseDayOfYearTaxoFacets       29.32     (37.6%)       31.00     (43.4%)    5.7% ( -54% -  138%) 0.656
            BrowseDateTaxoFacets       29.18     (37.9%)       30.94     (43.8%)    6.0% ( -54% -  141%) 0.641
     BrowseRandomLabelTaxoFacets       28.43     (47.1%)       30.67     (55.1%)    7.9% ( -64% -  207%) 0.627
                      OrHighHigh       19.75      (5.8%)       28.41      (6.0%)   43.9% (  30% -   59%) 0.000
                       OrHighMed       78.52      (6.7%)      181.93     (10.9%)  131.7% ( 106% -  159%) 0.000

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
           BrowseMonthSSDVFacets       28.38     (11.8%)       27.82     (10.5%)   -2.0% ( -21% -   22%) 0.573
                        PKLookup      296.46      (2.0%)      290.88      (2.9%)   -1.9% (  -6% -    3%) 0.016
               TermDayOfYearSort      214.70      (7.4%)      210.70      (4.1%)   -1.9% ( -12% -   10%) 0.323
                    TermBGroup1M       29.42      (4.0%)       28.91      (5.3%)   -1.7% ( -10% -    7%) 0.236
                     TermGroup1M       23.08      (3.4%)       22.70      (4.3%)   -1.7% (  -9% -    6%) 0.170
                      TermDTSort      345.43      (6.4%)      339.68      (4.9%)   -1.7% ( -12% -   10%) 0.354
                    TermGroup10K       30.91      (3.3%)       30.44      (4.5%)   -1.5% (  -9% -    6%) 0.220
                  TermDateFacets       47.40      (4.8%)       46.81      (3.8%)   -1.3% (  -9% -    7%) 0.362
                  TermBGroup1M1P       81.44      (6.7%)       80.48      (7.5%)   -1.2% ( -14% -   13%) 0.601
     BrowseRandomLabelSSDVFacets       20.26      (7.9%)       20.02      (7.8%)   -1.2% ( -15% -   15%) 0.637
            MedTermDayTaxoFacets       75.68      (4.2%)       74.84      (3.4%)   -1.1% (  -8% -    6%) 0.357
     BrowseRandomLabelTaxoFacets       38.85     (44.7%)       38.42     (46.1%)   -1.1% ( -63% -  162%) 0.940
                    TermGroup100       41.49      (3.7%)       41.05      (4.5%)   -1.0% (  -8% -    7%) 0.419
            BrowseDateTaxoFacets       37.84     (36.3%)       37.51     (38.4%)   -0.9% ( -55% -  115%) 0.941
       BrowseDayOfYearTaxoFacets       37.88     (36.2%)       37.60     (38.1%)   -0.7% ( -55% -  115%) 0.950
        AndHighHighDayTaxoFacets        7.05      (3.3%)        7.00      (3.8%)   -0.7% (  -7% -    6%) 0.533
                    SloppyPhrase       93.42      (7.8%)       93.27      (6.9%)   -0.2% ( -13% -   15%) 0.942
            BrowseDateSSDVFacets        3.81     (28.9%)        3.80     (28.5%)   -0.1% ( -44% -   80%) 0.993
                          Phrase       44.60      (2.9%)       44.68      (2.8%)    0.2% (  -5% -    6%) 0.840
                        SpanNear       27.76      (3.1%)       27.81      (2.8%)    0.2% (  -5% -    6%) 0.830
                   TermTitleSort      224.37      (7.2%)      225.08      (7.8%)    0.3% ( -13% -   16%) 0.895
                   TermMonthSort      277.86      (7.2%)      279.21      (7.9%)    0.5% ( -13% -   16%) 0.838
                          IntNRQ     1286.28      (3.0%)     1292.89      (2.0%)    0.5% (  -4% -    5%) 0.525
                            Term     2602.76      (3.0%)     2616.13      (3.7%)    0.5% (  -6% -    7%) 0.630
         AndHighMedDayTaxoFacets       78.64      (3.2%)       79.12      (3.0%)    0.6% (  -5% -    7%) 0.540
                        Wildcard      375.54      (5.9%)      378.24      (3.9%)    0.7% (  -8% -   11%) 0.649
          OrHighMedDayTaxoFacets       25.37      (7.9%)       25.56      (5.4%)    0.7% ( -11% -   15%) 0.728
                 AndHighOrMedMed      107.73      (5.2%)      108.60      (3.9%)    0.8% (  -7% -   10%) 0.572
                         Respell      108.71      (1.1%)      109.74      (2.2%)    0.9% (  -2% -    4%) 0.087
       BrowseDayOfYearSSDVFacets       27.55     (10.9%)       27.82     (13.0%)    1.0% ( -20% -   27%) 0.797
                      AndHighMed      110.51      (4.3%)      111.60      (3.7%)    1.0% (  -6% -    9%) 0.441
                          Fuzzy1      133.81      (1.2%)      135.34      (1.9%)    1.1% (  -1% -    4%) 0.025
                     AndHighHigh      119.20      (3.7%)      120.59      (3.4%)    1.2% (  -5% -    8%) 0.302
                          Fuzzy2       78.92      (1.4%)       80.08      (2.0%)    1.5% (  -1% -    4%) 0.008
                IntervalsOrdered       22.54      (4.5%)       22.90      (3.8%)    1.6% (  -6% -   10%) 0.226
           BrowseMonthTaxoFacets       33.99     (38.5%)       35.09     (37.9%)    3.2% ( -52% -  129%) 0.788
                         Prefix3      410.98      (8.6%)      425.40      (5.9%)    3.5% ( -10% -   19%) 0.131
                AndMedOrHighHigh       67.29      (3.7%)       69.77      (4.3%)    3.7% (  -4% -   12%) 0.003
                      OrHighHigh       19.57      (5.3%)       28.41      (5.6%)   45.2% (  32% -   59%) 0.000
                       OrHighMed       95.08      (4.9%)      271.09     (10.1%)  185.1% ( 162% -  210%) 0.000

Also my localconstants file & java version for reference

BASE_DIR = '/Users/xichen/IdeaProjects/benchmarks'
BENCH_BASE_DIR = '/Users/xichen/IdeaProjects/benchmarks/util'
WIKI_BIG_DOCS_LINE_FILE = '%s/data/enwiki-20130102-lines.txt' % BASE_DIR
WIKI_BIG_DOCS_COUNT = 6647577
INDEX_NUM_THREADS = 10
# SEARCH_NUM_THREADS = 6
topN=100

xichen@MacBook-Pro util % java --version
java 17.0.2 2022-01-18 LTS
Java(TM) SE Runtime Environment (build 17.0.2+8-LTS-86)
Java HotSpot(TM) 64-Bit Server VM (build 17.0.2+8-LTS-86, mixed mode, sharing)

Maybe the nightly benchmark is using another suite of tests or the JVM setting matters? I'll see if I can run the original nightly benchmark code / tests from my machine to see if there's any difference.

asfimport commented 2 years ago

Zach Chen (@zacharymorn) (migrated from JIRA)

I'll see if I can run the original nightly benchmark code / tests from my machine to see if there's any difference.

I tried to run ** nightlyBench.py locally on my machine over the weekend, but that turns out to require some changes to the script itself, and I haven't been able to run it fully so far.

On the other hand, I tried a few more run configurations with ** localrun.py, including running it in a virtual ubuntu box (as the nightly benchmark runs on linux box), but still have no luck so far re-producing the AndHighOrMedMed slow-down.

@jpountz, just curious, are you able to reproduce the slow-down locally on your end as well ?

asfimport commented 2 years ago

Adrien Grand (@jpountz) (migrated from JIRA)

I haven't tried to reproduce it but the steps you took by running on wikibigall with the nightly tasks file sound good to me. Another thing that changes performance sometimes is the doc ID order, were you using multiple indexing threads maybe?

Ignoring the fact that we cannot reproduce the slowdown, if I try to think of the main differences between WANDScorer and BlockMaxMaxscoreScorer for AndHighOrMedMed, I think the main one is the way that advanceShallow is computed. Conjunctions use block boundaries of the clause that has the lowest cost, so this could explain why we are seeing a slowdown with AndHighOrMedMed (since the conjunction uses block boundaries of OrMedMed) and not AndMedOrHighHigh (since the conjunction uses block boundaries of Med). Maybe we could explore other approaches for advanceShallow such as taking the minimum block boundary across essential clauses only instead of all clauses.

asfimport commented 2 years ago

Zach Chen (@zacharymorn) (migrated from JIRA)

Another thing that changes performance sometimes is the doc ID order, were you using multiple indexing threads maybe?

Ok this is actually the case for me. I was previously using 10 threads to index (INDEX_NUM_THREADS = 10) , and after I commented that out and reindexed with default setting, I was able to reproduce the slowdown:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                 AndHighOrMedMed       91.27      (4.3%)       85.52      (4.3%)   -6.3% ( -14% -    2%) 0.000
                        PKLookup      333.25      (4.3%)      329.48      (3.8%)   -1.1% (  -8% -    7%) 0.380
                     AndHighHigh      104.25      (2.9%)      103.11      (3.0%)   -1.1% (  -6% -    5%) 0.247
                        SpanNear       16.52      (3.8%)       16.36      (3.1%)   -0.9% (  -7% -    6%) 0.396
                    TermGroup10K       23.99      (3.3%)       23.78      (3.0%)   -0.9% (  -6% -    5%) 0.384
                          Phrase      234.74      (2.7%)      232.71      (1.8%)   -0.9% (  -5% -    3%) 0.235
                      AndHighMed      163.80      (3.5%)      162.42      (4.3%)   -0.8% (  -8% -    7%) 0.496
                    TermBGroup1M       48.02      (3.5%)       47.65      (3.7%)   -0.8% (  -7% -    6%) 0.496
                    SloppyPhrase        4.82      (3.4%)        4.78      (2.7%)   -0.7% (  -6% -    5%) 0.460
                    TermGroup100       41.90      (3.9%)       41.63      (3.3%)   -0.7% (  -7% -    6%) 0.569
                            Term     2680.42      (4.7%)     2664.05      (3.3%)   -0.6% (  -8% -    7%) 0.632
                     TermGroup1M       39.95      (2.9%)       39.71      (3.2%)   -0.6% (  -6% -    5%) 0.531
                  TermBGroup1M1P       84.21      (6.1%)       83.82      (5.7%)   -0.5% ( -11% -   12%) 0.801
                         Respell      113.78      (1.9%)      113.44      (1.7%)   -0.3% (  -3% -    3%) 0.603
     BrowseRandomLabelSSDVFacets       20.75      (8.2%)       20.74     (10.3%)   -0.0% ( -17% -   20%) 0.989
                          Fuzzy2       83.12      (1.8%)       83.11      (1.1%)   -0.0% (  -2% -    2%) 0.976
       BrowseDayOfYearSSDVFacets       26.69     (12.0%)       26.70     (11.6%)    0.0% ( -21% -   26%) 0.995
                        Wildcard      115.84      (5.1%)      115.96      (5.8%)    0.1% ( -10% -   11%) 0.951
               TermDayOfYearSort      260.70      (5.4%)      260.99      (2.8%)    0.1% (  -7% -    8%) 0.937
         AndHighMedDayTaxoFacets      136.32      (2.6%)      136.63      (2.3%)    0.2% (  -4% -    5%) 0.773
                IntervalsOrdered      128.13      (7.5%)      128.45      (7.7%)    0.3% ( -13% -   16%) 0.916
        AndHighHighDayTaxoFacets       13.82      (2.8%)       13.87      (2.6%)    0.4% (  -4% -    5%) 0.657
                          Fuzzy1       79.16      (2.7%)       79.60      (1.8%)    0.6% (  -3% -    5%) 0.433
                   TermMonthSort      360.17      (6.4%)      362.83      (7.1%)    0.7% ( -11% -   15%) 0.728
                   TermTitleSort      191.21      (6.8%)      192.70      (7.1%)    0.8% ( -12% -   15%) 0.723
                      TermDTSort      208.40      (2.9%)      210.39      (2.9%)    1.0% (  -4% -    7%) 0.301
            MedTermDayTaxoFacets       78.66      (5.2%)       79.59      (4.4%)    1.2% (  -7% -   11%) 0.436
                  TermDateFacets       41.04      (5.4%)       41.61      (4.7%)    1.4% (  -8% -   12%) 0.385
                          IntNRQ      122.00      (8.1%)      124.08      (8.3%)    1.7% ( -13% -   19%) 0.513
          OrHighMedDayTaxoFacets       23.16      (8.4%)       23.71      (4.9%)    2.4% ( -10% -   17%) 0.272
           BrowseMonthSSDVFacets       28.68     (13.8%)       29.55     (16.8%)    3.0% ( -24% -   39%) 0.531
       BrowseDayOfYearTaxoFacets       30.40     (32.2%)       31.67     (34.2%)    4.2% ( -47% -  103%) 0.690
            BrowseDateTaxoFacets       30.26     (32.2%)       31.57     (34.4%)    4.3% ( -47% -  104%) 0.680
                         Prefix3      402.14      (8.6%)      419.96      (8.9%)    4.4% ( -12% -   23%) 0.109
                AndMedOrHighHigh       94.79      (4.0%)       99.03      (4.5%)    4.5% (  -3% -   13%) 0.001
     BrowseRandomLabelTaxoFacets       32.45     (49.2%)       35.05     (53.4%)    8.0% ( -63% -  217%) 0.622
           BrowseMonthTaxoFacets       28.68     (35.3%)       31.37     (39.1%)    9.4% ( -48% -  129%) 0.425
            BrowseDateSSDVFacets        3.96     (28.1%)        4.54     (26.3%)   14.7% ( -31% -   96%) 0.089
                      OrHighHigh      116.10      (3.5%)      156.34      (7.4%)   34.7% (  22% -   47%) 0.000
                       OrHighMed      120.07      (3.8%)      238.81      (5.3%)   98.9% (  86% -  112%) 0.000

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
           BrowseMonthTaxoFacets       28.92     (36.4%)       27.02     (32.8%)   -6.6% ( -55% -   98%) 0.548
          OrHighMedDayTaxoFacets        4.46      (4.4%)        4.30      (7.7%)   -3.6% ( -14% -    8%) 0.072
                 AndHighOrMedMed      113.85      (5.3%)      110.94      (4.6%)   -2.5% ( -11% -    7%) 0.102
                      AndHighMed      126.02      (3.4%)      123.47      (3.7%)   -2.0% (  -8% -    5%) 0.072
                  TermBGroup1M1P       62.98      (6.2%)       61.72      (5.8%)   -2.0% ( -13% -   10%) 0.293
     BrowseRandomLabelSSDVFacets       20.94      (5.6%)       20.60      (6.7%)   -1.6% ( -13% -   11%) 0.402
                    TermGroup100       41.54      (3.8%)       41.00      (3.1%)   -1.3% (  -7% -    5%) 0.237
            MedTermDayTaxoFacets       78.99      (3.3%)       78.06      (4.4%)   -1.2% (  -8% -    6%) 0.342
        AndHighHighDayTaxoFacets        7.08      (3.4%)        7.00      (3.3%)   -1.1% (  -7% -    5%) 0.295
                  TermDateFacets       57.17      (3.6%)       56.57      (4.6%)   -1.0% (  -8% -    7%) 0.426
                      TermDTSort      340.16      (4.3%)      336.88      (2.7%)   -1.0% (  -7% -    6%) 0.396
                          Phrase      116.48      (4.5%)      115.36      (4.4%)   -1.0% (  -9% -    8%) 0.497
           BrowseMonthSSDVFacets       29.80     (10.9%)       29.51     (11.8%)   -0.9% ( -21% -   24%) 0.792
                    TermBGroup1M       30.20      (3.9%)       29.94      (4.2%)   -0.9% (  -8% -    7%) 0.490
                     AndHighHigh      132.26      (3.2%)      131.10      (3.3%)   -0.9% (  -7% -    5%) 0.394
                     TermGroup1M       39.70      (2.9%)       39.38      (3.9%)   -0.8% (  -7% -    6%) 0.445
                        SpanNear      168.65      (3.2%)      167.49      (2.3%)   -0.7% (  -6% -    5%) 0.438
                    TermGroup10K       43.11      (3.5%)       43.01      (4.3%)   -0.2% (  -7% -    7%) 0.853
                            Term     3172.83      (2.7%)     3168.67      (3.1%)   -0.1% (  -5% -    5%) 0.887
                   TermTitleSort      218.63      (3.1%)      218.36      (2.7%)   -0.1% (  -5% -    5%) 0.892
                   TermMonthSort      353.25      (3.0%)      353.58      (2.6%)    0.1% (  -5% -    5%) 0.917
                          IntNRQ     1208.96      (2.0%)     1210.20      (2.5%)    0.1% (  -4% -    4%) 0.887
            BrowseDateTaxoFacets       27.09     (26.8%)       27.15     (29.3%)    0.2% ( -44% -   76%) 0.981
         AndHighMedDayTaxoFacets       95.98      (3.0%)       96.25      (2.9%)    0.3% (  -5% -    6%) 0.771
       BrowseDayOfYearTaxoFacets       27.16     (26.8%)       27.26     (29.4%)    0.3% ( -44% -   77%) 0.969
       BrowseDayOfYearSSDVFacets       26.55      (5.3%)       26.70      (9.2%)    0.6% ( -13% -   15%) 0.811
                        PKLookup      326.57      (5.1%)      328.96      (4.4%)    0.7% (  -8% -   10%) 0.627
                IntervalsOrdered       10.66      (3.3%)       10.75      (3.9%)    0.9% (  -6% -    8%) 0.457
                          Fuzzy2      145.01      (2.0%)      146.28      (2.6%)    0.9% (  -3% -    5%) 0.225
                         Respell      112.65      (2.1%)      113.64      (3.1%)    0.9% (  -4% -    6%) 0.299
                          Fuzzy1      134.04      (1.8%)      135.48      (3.0%)    1.1% (  -3% -    5%) 0.171
                    SloppyPhrase       13.24      (3.9%)       13.43      (4.0%)    1.4% (  -6% -    9%) 0.263
                        Wildcard      235.26      (5.1%)      239.03      (4.7%)    1.6% (  -7% -   11%) 0.299
               TermDayOfYearSort      142.00      (3.3%)      145.14      (6.9%)    2.2% (  -7% -   12%) 0.198
                         Prefix3       86.50      (7.2%)       88.51      (6.4%)    2.3% ( -10% -   17%) 0.281
                AndMedOrHighHigh       96.75      (3.9%)       99.91      (4.2%)    3.3% (  -4% -   11%) 0.011
     BrowseRandomLabelTaxoFacets       27.01     (42.1%)       27.91     (49.0%)    3.3% ( -61% -  163%) 0.819
                      OrHighHigh       21.33      (6.6%)       23.52      (3.1%)   10.3% (   0% -   21%) 0.000
            BrowseDateSSDVFacets        3.74     (27.2%)        4.43     (29.9%)   18.4% ( -30% -  103%) 0.042
                       OrHighMed      105.91      (4.8%)      178.83      (9.8%)   68.9% (  51% -   87%) 0.000

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
            BrowseDateSSDVFacets        4.14     (28.1%)        3.60     (28.4%)  -13.1% ( -54% -   60%) 0.143
     BrowseRandomLabelSSDVFacets       21.06      (9.7%)       20.33     (11.1%)   -3.4% ( -22% -   19%) 0.295
                  TermBGroup1M1P       55.35      (6.7%)       53.62      (6.2%)   -3.1% ( -15% -   10%) 0.124
                      TermDTSort      212.36      (6.4%)      207.44      (3.0%)   -2.3% ( -11% -    7%) 0.145
                 AndHighOrMedMed      124.19      (5.3%)      121.49      (4.5%)   -2.2% ( -11% -    8%) 0.161
                  TermDateFacets       73.81      (4.0%)       72.37      (4.0%)   -2.0% (  -9% -    6%) 0.124
            MedTermDayTaxoFacets       82.87      (4.0%)       81.30      (4.1%)   -1.9% (  -9% -    6%) 0.140
                   TermMonthSort      225.69      (9.2%)      222.07      (7.1%)   -1.6% ( -16% -   16%) 0.538
                   TermTitleSort      225.68      (9.0%)      222.16      (7.3%)   -1.6% ( -16% -   16%) 0.548
                    TermGroup100       41.75      (3.0%)       41.16      (2.8%)   -1.4% (  -7% -    4%) 0.130
                          IntNRQ       89.84      (7.1%)       88.65      (9.7%)   -1.3% ( -16% -   16%) 0.621
                    TermBGroup1M       39.21      (3.7%)       38.75      (3.3%)   -1.2% (  -7% -    6%) 0.289
                          Phrase      115.01      (5.9%)      113.72      (6.2%)   -1.1% ( -12% -   11%) 0.558
     BrowseRandomLabelTaxoFacets       31.99     (48.4%)       31.68     (46.7%)   -1.0% ( -64% -  182%) 0.950
                    TermGroup10K       23.54      (3.2%)       23.33      (2.7%)   -0.9% (  -6% -    5%) 0.347
                            Term     2742.88      (3.5%)     2723.92      (3.3%)   -0.7% (  -7% -    6%) 0.521
                    SloppyPhrase       13.33      (1.9%)       13.25      (2.6%)   -0.6% (  -5% -    4%) 0.415
        AndHighHighDayTaxoFacets       38.27      (2.4%)       38.05      (1.6%)   -0.6% (  -4% -    3%) 0.373
       BrowseDayOfYearTaxoFacets       30.28     (33.7%)       30.12     (33.7%)   -0.5% ( -50% -  100%) 0.961
            BrowseDateTaxoFacets       30.19     (33.7%)       30.06     (33.8%)   -0.4% ( -50% -  101%) 0.968
                     TermGroup1M       40.47      (3.8%)       40.34      (3.4%)   -0.3% (  -7% -    7%) 0.774
         AndHighMedDayTaxoFacets       49.03      (2.5%)       48.88      (2.3%)   -0.3% (  -5% -    4%) 0.699
                      AndHighMed      166.12      (5.2%)      165.86      (5.6%)   -0.2% ( -10% -   11%) 0.928
           BrowseMonthSSDVFacets       28.25     (10.1%)       28.21     (12.9%)   -0.1% ( -21% -   25%) 0.968
                         Prefix3      465.74      (6.7%)      466.51      (5.2%)    0.2% ( -11% -   12%) 0.930
                IntervalsOrdered       23.37      (4.5%)       23.43      (4.3%)    0.3% (  -8% -    9%) 0.853
                     AndHighHigh      130.93      (3.8%)      131.44      (4.2%)    0.4% (  -7% -    8%) 0.755
                        Wildcard      165.26      (6.3%)      165.93      (4.9%)    0.4% ( -10% -   12%) 0.819
                        SpanNear       28.93      (3.6%)       29.22      (3.1%)    1.0% (  -5% -    7%) 0.336
                          Fuzzy1      162.85      (2.8%)      165.51      (4.2%)    1.6% (  -5% -    8%) 0.147
          OrHighMedDayTaxoFacets       15.23      (8.5%)       15.49      (9.1%)    1.7% ( -14% -   21%) 0.538
                          Fuzzy2      144.23      (3.2%)      146.75      (3.9%)    1.7% (  -5% -    9%) 0.119
       BrowseDayOfYearSSDVFacets       26.63      (9.7%)       27.13     (13.8%)    1.9% ( -19% -   28%) 0.616
                        PKLookup      324.80      (3.5%)      331.01      (3.9%)    1.9% (  -5% -    9%) 0.103
               TermDayOfYearSort      143.15      (5.8%)      145.89      (7.1%)    1.9% ( -10% -   15%) 0.351
           BrowseMonthTaxoFacets       30.39     (35.7%)       30.99     (36.5%)    2.0% ( -51% -  115%) 0.863
                         Respell      111.15      (3.7%)      114.29      (5.1%)    2.8% (  -5% -   12%) 0.045
                AndMedOrHighHigh       95.45      (4.3%)      100.22      (5.2%)    5.0% (  -4% -   15%) 0.001
                      OrHighHigh       25.86      (6.1%)       38.74      (5.6%)   49.8% (  35% -   65%) 0.000
                       OrHighMed      124.45      (6.6%)      240.13      (6.5%)   93.0% (  74% -  113%) 0.000

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
            BrowseDateSSDVFacets        4.34     (34.8%)        3.94     (32.7%)   -9.4% ( -57% -   89%) 0.378
           BrowseMonthTaxoFacets       33.54     (30.9%)       31.18     (31.2%)   -7.0% ( -52% -   79%) 0.475
                 AndHighOrMedMed       92.17      (4.5%)       86.48      (3.8%)   -6.2% ( -13% -    2%) 0.000
                          IntNRQ      124.38      (7.3%)      122.08      (8.8%)   -1.9% ( -16% -   15%) 0.471
                      TermDTSort      264.88      (3.9%)      260.90      (2.7%)   -1.5% (  -7% -    5%) 0.153
                   TermTitleSort      276.74      (4.0%)      272.63      (2.5%)   -1.5% (  -7% -    5%) 0.159
           BrowseMonthSSDVFacets       29.01     (13.6%)       28.60     (12.0%)   -1.4% ( -23% -   27%) 0.725
                   TermMonthSort      222.92      (3.9%)      220.25      (2.6%)   -1.2% (  -7% -    5%) 0.252
            MedTermDayTaxoFacets       79.12      (3.4%)       78.33      (4.1%)   -1.0% (  -8% -    6%) 0.401
                     AndHighHigh      103.30      (2.7%)      102.29      (2.8%)   -1.0% (  -6% -    4%) 0.258
                          Fuzzy2      124.60      (2.9%)      123.47      (2.2%)   -0.9% (  -5% -    4%) 0.260
                  TermDateFacets       34.41      (4.0%)       34.11      (5.0%)   -0.9% (  -9% -    8%) 0.538
                          Fuzzy1      135.75      (2.3%)      134.66      (2.0%)   -0.8% (  -5% -    3%) 0.240
                    SloppyPhrase        3.11      (5.0%)        3.08      (4.3%)   -0.8% (  -9% -    8%) 0.594
                    TermGroup100       36.45      (3.4%)       36.19      (4.1%)   -0.7% (  -7% -    7%) 0.547
     BrowseRandomLabelTaxoFacets       33.28     (46.8%)       33.06     (46.6%)   -0.7% ( -64% -  174%) 0.964
                          Phrase      113.36      (4.2%)      112.65      (3.9%)   -0.6% (  -8% -    7%) 0.623
       BrowseDayOfYearTaxoFacets       31.53     (32.2%)       31.36     (32.3%)   -0.5% ( -49% -   94%) 0.958
          OrHighMedDayTaxoFacets       13.78      (4.6%)       13.72      (3.8%)   -0.5% (  -8% -    8%) 0.705
                         Respell       97.91      (2.4%)       97.42      (2.2%)   -0.5% (  -5% -    4%) 0.496
                         Prefix3      458.74      (7.4%)      456.64      (8.0%)   -0.5% ( -14% -   16%) 0.851
            BrowseDateTaxoFacets       31.40     (32.4%)       31.26     (32.3%)   -0.5% ( -49% -   94%) 0.964
                      AndHighMed      123.44      (3.6%)      122.93      (3.0%)   -0.4% (  -6% -    6%) 0.695
     BrowseRandomLabelSSDVFacets       20.80      (8.7%)       20.73      (9.0%)   -0.3% ( -16% -   19%) 0.914
               TermDayOfYearSort      147.07      (5.5%)      146.66      (7.0%)   -0.3% ( -12% -   12%) 0.889
                IntervalsOrdered       10.67      (4.2%)       10.64      (3.4%)   -0.3% (  -7% -    7%) 0.820
         AndHighMedDayTaxoFacets      217.34      (1.9%)      216.78      (1.8%)   -0.3% (  -3% -    3%) 0.661
        AndHighHighDayTaxoFacets       13.89      (2.4%)       13.87      (2.7%)   -0.1% (  -5% -    5%) 0.861
                     TermGroup1M       23.81      (2.5%)       23.79      (4.0%)   -0.1% (  -6% -    6%) 0.920
                            Term     2926.73      (3.7%)     2924.25      (3.8%)   -0.1% (  -7% -    7%) 0.942
                    TermBGroup1M       53.67      (2.3%)       53.63      (3.9%)   -0.1% (  -6% -    6%) 0.945
                    TermGroup10K       29.55      (2.4%)       29.54      (4.1%)   -0.0% (  -6% -    6%) 0.977
                  TermBGroup1M1P       45.34      (6.3%)       45.33      (8.2%)   -0.0% ( -13% -   15%) 0.992
                        Wildcard      114.51      (4.9%)      115.73      (5.5%)    1.1% (  -8% -   12%) 0.519
                        SpanNear       29.14      (3.0%)       29.46      (2.4%)    1.1% (  -4% -    6%) 0.184
                        PKLookup      333.07      (4.7%)      336.88      (2.7%)    1.1% (  -5% -    8%) 0.342
       BrowseDayOfYearSSDVFacets       27.13     (13.1%)       27.48     (11.9%)    1.3% ( -20% -   30%) 0.746
                AndMedOrHighHigh       89.60      (3.7%)       95.84      (3.4%)    7.0% (   0% -   14%) 0.000
                      OrHighHigh       21.21      (5.1%)       23.62      (5.0%)   11.3% (   1% -   22%) 0.000
                       OrHighMed      122.68      (4.1%)      242.95      (7.1%)   98.0% (  83% -  113%) 0.000

if I try to think of the main differences between WANDScorer and BlockMaxMaxscoreScorer for AndHighOrMedMed, I think the main one is the way that advanceShallow is computed. Conjunctions use block boundaries of the clause that has the lowest cost, so this could explain why we are seeing a slowdown with AndHighOrMedMed (since the conjunction uses block boundaries of OrMedMed) and not AndMedOrHighHigh (since the conjunction uses block boundaries of Med). Maybe we could explore other approaches for advanceShallow such as taking the minimum block boundary across essential clauses only instead of all clauses.

Ah this is interesting to know! I guess I can open another ticket to explore this improvement further? Do you think this slowdown to AndHighOrMedMed may be considered as blocker to 9.3 release?

asfimport commented 2 years ago

Adrien Grand (@jpountz) (migrated from JIRA)

+1 to explore this in a separate issue.

Do you think this slowdown to AndHighOrMedMed may be considered as blocker to 9.3 release?

I wouldn't say blocker, but maybe we could give us time indeed by only using this new scorer on top-level disjunctions for now so that we have more time to figure out whether we should stick to BMW or switch to BMM for inner disjunctions.

asfimport commented 2 years ago

Zach Chen (@zacharymorn) (migrated from JIRA)

I wouldn't say blocker, but maybe we could give us time indeed by only using this new scorer on top-level disjunctions for now so that we have more time to figure out whether we should stick to BMW or switch to BMM for inner disjunctions.

Sounds good. I tried a few quick approaches to limit BMM scorer to top-level disjunctions in BooleanWeight or {}Boolean2ScorerSupplier{}, but they didn't work due to weight's / query's recursive logic. So I ended up wrapping the scorer inside a bulk scorer (https://github.com/apache/lucene/pull/1018, pending tests update) like your other PR. Please let me know if this approach looks good to you, or if there's a better approach.

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit 28ce8abb5105dba5bc08b7f800f86f3741268bc9 in lucene's branch refs/heads/main from Zach Chen https://gitbox.apache.org/repos/asf?p=lucene.git;h=28ce8abb510

LUCENE-10480: Use BulkScorer to limit BMMScorer to only top-level disjunctions (#1018)

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit 8ebb3305648aea8f551c2dd144d5a527b8982638 in lucene's branch refs/heads/branch_9x from Zach Chen https://gitbox.apache.org/repos/asf?p=lucene.git;h=8ebb3305648

LUCENE-10480: (Backporting) Use BulkScorer to limit BMMScorer to only top-level disjunctions (#1037)

asfimport commented 2 years ago

Zach Chen (@zacharymorn) (migrated from JIRA)

From the latest nightly benchmark result, the negative impact to nested boolean queries have been resolved, and the performance boost to top-level disjunction queries have been maintained. Thanks for all the guidance @jpountz !

apache / lucene

Specialize 2-clauses disjunctions [LUCENE-10480] #11516