Enable recursive graph bisection out of the box?

jpountz commented 1 year ago

Description

It would be nice to enable recursive graph bisection out of the box, so that users don't even have to know that it exists or what it is to enjoy its search-time performance benefits.

[x] #12489
[x] #12622
[x] #12685
[ ] Figure out lightweight parameters of recursive graph bisection that would make the merging overhead reasonable
[ ] Enable recursive graph bisection automatically when no index sort is configured and only addDocument is used.

jpountz commented 1 year ago

I did a first indexing run on wikibigall with the following merge policy, which I tried to make as lightweight as possible:

    BPIndexReorderer reorderer = new BPIndexReorderer();
    reorderer.setMinDocFreq(16384);
    reorderer.setMaxIters(3);
    reorderer.setMinPartitionSize(8192);
    mp = new BPReorderingMergePolicy(mp, reorderer, 131072);

Indexing ran in 3402170 msec vs. 2610068 msec without reordering, ie. 30% slower. (This is when running with default params, ie. maxBufferedDocs=12119, SerialMergeScheduler, LogDocMergePolicy (wrapped within BPReorderingMergePolicy), etc.) Search was noticeably faster:

                           TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                        PKLookup      279.70      (1.6%)      247.15      (2.8%)  -11.6% ( -15% -   -7%) 0.000
                        HighTerm      441.90      (7.8%)      414.88      (4.4%)   -6.1% ( -17% -    6%) 0.002
                  CountOrHighMed       94.53     (16.3%)       92.77     (15.9%)   -1.9% ( -29% -   36%) 0.715
                 CountOrHighHigh       60.96     (16.4%)       60.03     (16.4%)   -1.5% ( -29% -   37%) 0.768
               HighTermMonthSort     4314.65      (2.3%)     4274.31      (2.8%)   -0.9% (  -5% -    4%) 0.248
                         Respell       64.51      (1.1%)       64.38      (1.6%)   -0.2% (  -2% -    2%) 0.654
                     CountPhrase        3.54     (11.2%)        3.59      (8.2%)    1.2% ( -16% -   23%) 0.700
                        Wildcard       71.79      (2.6%)       72.66      (2.7%)    1.2% (  -3% -    6%) 0.143
                          Fuzzy2       89.57      (0.9%)       90.80      (1.2%)    1.4% (   0% -    3%) 0.000
                         Prefix3      123.64      (3.4%)      125.34      (2.8%)    1.4% (  -4% -    7%) 0.159
                       CountTerm    14193.65      (3.6%)    14589.84      (2.9%)    2.8% (  -3% -    9%) 0.007
                          IntNRQ      289.67      (6.0%)      299.22      (5.6%)    3.3% (  -7% -   15%) 0.074
                      HighPhrase        5.95      (7.6%)        6.16      (9.3%)    3.6% ( -12% -   22%) 0.180
                          Fuzzy1      104.33      (0.9%)      108.08      (1.2%)    3.6% (   1% -    5%) 0.000
                       LowPhrase       17.70      (3.3%)       18.74      (4.9%)    5.9% (  -2% -   14%) 0.000
                         MedTerm      533.08      (7.8%)      568.40      (4.5%)    6.6% (  -5% -   20%) 0.001
                      OrHighHigh       56.43      (5.8%)       60.45      (7.0%)    7.1% (  -5% -   21%) 0.000
                 CountAndHighMed      124.71      (3.2%)      136.61      (4.5%)    9.5% (   1% -   17%) 0.000
                       OrHighMed      212.88      (4.0%)      233.35      (5.0%)    9.6% (   0% -   19%) 0.000
                       OrHighLow      604.12      (2.8%)      676.18      (4.5%)   11.9% (   4% -   19%) 0.000
                      AndHighLow      933.07      (2.3%)     1046.85      (2.7%)   12.2% (   7% -   17%) 0.000
                         LowTerm      947.45      (6.1%)     1091.11      (4.7%)   15.2% (   4% -   27%) 0.000
                      AndHighMed      197.62      (3.1%)      232.11      (3.1%)   17.5% (  10% -   24%) 0.000
                       MedPhrase       42.93      (2.7%)       50.47      (5.4%)   17.6% (   9% -   26%) 0.000
                     AndHighHigh       52.74      (4.0%)       63.41      (4.6%)   20.2% (  11% -   30%) 0.000
                CountAndHighHigh       41.69      (3.5%)       50.60      (5.6%)   21.4% (  11% -   31%) 0.000
           HighTermDayOfYearSort      444.76      (1.7%)      652.11      (2.0%)   46.6% (  42% -   51%) 0.000

I'll look into whether I can reduce the merge-time overhead.

mikemccand commented 1 year ago

This would be awesome to enable by default. It would somehow disable itself if the application sets its own static index sort?

It's odd/curious that PKLookup got slower -- maybe the loss of locality/predictability of the primary key in luceneutil makes BlockTree work harder for lookups. Given the sizable gains on the other queries, we can accept some loss of indexing performance in exchange for these search-time gains, by default?

jpountz commented 1 year ago

It would somehow disable itself if the application sets its own static index sort?

This is correct. This bit already works on the PR, IndexWriter doesn't check the new method that OneMerge got that allows reordering the resulting segment if an index sort is configured in the IndexWriterConfig.

It's odd/curious that PKLookup got slower

The terms dictionary should be exactly the same, only doc IDs get renumbered. BlockTree optimizes for incremental IDs by storing the delta of a singleton doc ID with the previous doc ID (termState.singletonDocID += BitUtil.zigZagDecode(l >>> 1); in Lucene90PostingsReader). I wonder that it doesn't only help with storage, but also with lookups since it's stored as a single-byte vInt in the incremental case, but often on a variable number of bytes after reordering.

jpountz commented 1 year ago

I was just checking out a profile, and with this lightweight BP configuration, we end up spending more time on building the forward index (essentially calling OfflineSorter on all postings) and merging (because SortingCodecReader introduces overhead as it needs to re-sort e.g. postings on the fly) than on reordering.

gf2121 commented 1 year ago

essentially calling OfflineSorter on all postings

FYI, I came up with some ideas to optimize this sort before, hoping to be helpful :)

If we use a stable sorter, we can only compare docIds because termIds are already in order.
If we take the maxDoc into consideration, we can save 1 round of reorder when maxDoc < (1 << 24).
We may even purely use an offline version of radix sorter to sort the whole file, since all we need is just 3 or 4 times reorder based on point 1 and 2.

jpountz commented 1 year ago

These sound like great ideas!

gf2121 commented 1 year ago

I initialized a PR on these ideas https://github.com/apache/lucene/pull/12712

jpountz commented 1 month ago

As I expect out-of-the-box recursive graph bisection to be a bit controversial, I opened a PR to make it easier to use without making it the default: #13765.

apache / lucene

Enable recursive graph bisection out of the box? #12665

Description