apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.69k stars 1.04k forks source link

Introduce a BlockReader based on ForUtil and use it for NumericDocValues [LUCENE-10334] #11370

Open asfimport opened 2 years ago

asfimport commented 2 years ago

Previous talk is here: https://github.com/apache/lucene/pull/557

This is trying to add a new BlockReader based on ForUtil to replace the DirectReader we are using for NumericDocvalues

Benchmark based on wiki10m (Previous benchmark results are wrong so i deleted it to avoid misleading, let's see the benchmark in comments.)


Migrated from LUCENE-10334 by Feng Guo (@gf2121), updated Dec 30 2021 Pull requests: https://github.com/apache/lucene/pull/562

asfimport commented 2 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Wow! very promising results :)

asfimport commented 2 years ago

Greg Miller (@gsmiller) (migrated from JIRA)

Cool! I haven't looked at the PR yet but this sounds similar to #11072. In that attempt at encoding/decoding blocks of doc values in bulk we observed some performance regressions when the docs needing values were sparse relative to the index (e.g., looking up doc values for a small match set relative to a large index). I think more faceting tasks have been added to luceneutil since then that hopefully capture this type of use-case (e.g., BrowseRandomLabelTaxoFacets), and it looks like performance there isn't regressing at all, but I just want to call that out as one area to pay attention to with a change like this.

Exciting stuff! Nice to see experimentation like this in the doc values :)

asfimport commented 2 years ago

Feng Guo (@gf2121) (migrated from JIRA)

Thanks @gsmiller ! Yes I do thought i would get some regression in sparse-hits tasks and the result suprised me too. Maybe we should thank to the powerful ForUtil}? :)

By reading codes in #11072, i suspect there are two reasons that could probably lead to a more obvious regression than this approach:

  1. 10033 approach computes bpv for each small block and need to read the pointer from a DirectMonotonicReader before seeking. While this approach is using a global bpv and pointers can be computed by offset + blockBytes * block}. This could be faster. A global bpv can lead larger index size but i think it acceptable since it's what we used to do.

  2. 10033 approach decode offset/gcd/delta for each block, some of them could be auto-vectorized but still a bit heavier. This approach is trying to make the decoding of blocks as simple as possible and jobs like gcd decoding is only done for hit docs.

I'm not really sure these are major reasons but should make the benchmark result a bit more explainable.

asfimport commented 2 years ago

Feng Guo (@gf2121) (migrated from JIRA)

Hi all! Since all existing luceneutil tasks look good, I wonder if we need to add some more tasks or try this approach in Amazon's product search engine benchmark (like what we did in  https://issues.apache.org/jira/browse/LUCENE-10033) to justify this change? I'm willing to do any work to futher test this if any. Or if we think existing luceneUtil tasks are enough to justify this, I've fixed CI issues and the PR is probably ready for a reivew now :)

In this PR, I only replaced the DirectReader used in NumericDocValues#longValue with BlockReader but i suspect this could be used in some other places (e.g. DirectMonotonicReader, stored fields, even in BKD https://issues.apache.org/jira/browse/LUCENE-10315). I'll justify those changes in follow ups.

asfimport commented 2 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I think you took a good approach: small steps first, to make some progress. Sorry I haven't looked at the PR yet, it is holidays here so I've just been busy.

asfimport commented 2 years ago

Feng Guo (@gf2121) (migrated from JIRA)

Thanks @rmuir for reply! No hurry here, feel free to ignore this and have a nice holiday :)

asfimport commented 2 years ago

Feng Guo (@gf2121) (migrated from JIRA)

I'm so sorry to tell that there is something wrong with my benchmark: The localrun script was still using the facets described in luceneutil readme, like this:

facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 'Month'),('sortedset:DayOfYear', 'DayOfYear'))
index = comp.newIndex('lucene_baseline', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long')

And i got the result mentioned above with this facets.

But when i'm cloning a new luceneutil and rerun the setup.py, it becomes:

index = comp.newIndex('lucene_baseline', sourceData,
                        facets = (('taxonomy:Date', 'Date'),
                                  ('taxonomy:Month', 'Month'),
                                  ('taxonomy:DayOfYear', 'DayOfYear'),
                                  ('sortedset:Month', 'Month'),
                                  ('sortedset:DayOfYear', 'DayOfYear'),
                                  ('taxonomy:RandomLabel', 'RandomLabel'),
                                  ('sortedset:RandomLabel', 'RandomLabel')))

And the result is totally different with this:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
       BrowseDayOfYearTaxoFacets       13.65      (8.9%)       10.49      (2.6%)  -23.2% ( -31% -  -12%) 0.000
           BrowseMonthTaxoFacets       13.54     (14.6%)       10.89      (2.9%)  -19.6% ( -32% -   -2%) 0.000
            BrowseDateTaxoFacets       13.50      (8.8%)       11.11      (3.7%)  -17.7% ( -27% -   -5%) 0.000
     BrowseRandomLabelTaxoFacets       11.78      (7.0%)        9.94      (5.1%)  -15.6% ( -25% -   -3%) 0.000
            MedTermDayTaxoFacets       47.49      (2.4%)       41.45      (3.4%)  -12.7% ( -18% -   -7%) 0.000
         AndHighMedDayTaxoFacets      130.24      (2.7%)      119.48      (3.9%)   -8.3% ( -14% -   -1%) 0.000
        AndHighHighDayTaxoFacets       28.80      (2.8%)       27.09      (3.1%)   -5.9% ( -11% -    0%) 0.000
          OrHighMedDayTaxoFacets        9.68      (2.7%)        9.35      (2.8%)   -3.4% (  -8% -    2%) 0.000
           HighTermDayOfYearSort      139.73      (9.6%)      135.74     (10.2%)   -2.9% ( -20% -   18%) 0.361
                      TermDTSort      151.46      (9.0%)      147.40      (7.7%)   -2.7% ( -17% -   15%) 0.311
                          Fuzzy2       35.22      (6.3%)       34.38      (5.9%)   -2.4% ( -13% -   10%) 0.213
                 MedSloppyPhrase       78.99      (6.7%)       77.21      (7.1%)   -2.3% ( -15% -   12%) 0.300
                         LowTerm     1636.38      (6.4%)     1600.26      (9.6%)   -2.2% ( -17% -   14%) 0.392
                       LowPhrase      252.68      (3.8%)      247.11      (6.5%)   -2.2% ( -12% -    8%) 0.189
                         Respell       61.23      (2.3%)       59.89      (5.0%)   -2.2% (  -9% -    5%) 0.078
                     AndHighHigh       56.54      (2.6%)       55.43      (4.3%)   -2.0% (  -8% -    5%) 0.084
                     MedSpanNear       99.37      (2.4%)       97.44      (5.2%)   -1.9% (  -9% -    5%) 0.128
                HighSloppyPhrase       28.58      (5.4%)       28.05      (5.4%)   -1.8% ( -11% -    9%) 0.280
                        PKLookup      198.95      (3.0%)      195.34      (4.8%)   -1.8% (  -9% -    6%) 0.148
                      AndHighMed      116.50      (3.3%)      114.65      (4.5%)   -1.6% (  -9% -    6%) 0.204
                          Fuzzy1       75.07      (6.4%)       73.99      (8.1%)   -1.4% ( -14% -   13%) 0.532
                    HighSpanNear       10.73      (2.8%)       10.58      (3.9%)   -1.4% (  -7% -    5%) 0.180
                     LowSpanNear       43.92      (2.4%)       43.30      (3.4%)   -1.4% (  -6% -    4%) 0.128
                 LowSloppyPhrase       14.70      (4.4%)       14.50      (4.2%)   -1.3% (  -9% -    7%) 0.329
               HighTermMonthSort      148.80      (8.3%)      146.84      (8.1%)   -1.3% ( -16% -   16%) 0.612
                       OrHighMed      103.00      (3.2%)      101.67      (5.1%)   -1.3% (  -9% -    7%) 0.341
             MedIntervalsOrdered        5.44      (2.5%)        5.37      (2.2%)   -1.3% (  -5% -    3%) 0.092
                   OrHighNotHigh      648.74      (6.7%)      640.81      (8.8%)   -1.2% ( -15% -   15%) 0.621
                       MedPhrase       80.35      (2.7%)       79.38      (4.8%)   -1.2% (  -8% -    6%) 0.327
                        HighTerm     1384.91      (6.8%)     1369.27      (8.8%)   -1.1% ( -15% -   15%) 0.650
                          IntNRQ      127.36      (2.8%)      125.95      (5.8%)   -1.1% (  -9% -    7%) 0.440
            HighIntervalsOrdered       12.81      (2.6%)       12.67      (2.6%)   -1.1% (  -6% -    4%) 0.190
                      OrHighHigh       28.28      (1.9%)       28.03      (2.6%)   -0.9% (  -5% -    3%) 0.218
                        Wildcard       85.94      (3.2%)       85.20      (3.7%)   -0.9% (  -7% -    6%) 0.438
                      AndHighLow      577.83      (6.0%)      573.01      (6.2%)   -0.8% ( -12% -   12%) 0.663
             LowIntervalsOrdered       14.53      (2.5%)       14.44      (1.9%)   -0.6% (  -4% -    3%) 0.354
                   OrNotHighHigh      678.90      (6.6%)      675.09      (9.3%)   -0.6% ( -15% -   16%) 0.826
            HighTermTitleBDVSort      145.07     (10.1%)      144.63     (11.7%)   -0.3% ( -20% -   23%) 0.931
                         Prefix3      136.65      (4.5%)      136.49      (5.1%)   -0.1% (  -9% -    9%) 0.937
                      HighPhrase       27.41      (2.7%)       27.45      (4.4%)    0.1% (  -6% -    7%) 0.898
                         MedTerm     1470.28      (5.9%)     1472.69      (8.1%)    0.2% ( -13% -   15%) 0.941
                    OrHighNotMed      596.47      (4.9%)      597.84      (8.0%)    0.2% ( -12% -   13%) 0.914
                       OrHighLow      419.76      (6.5%)      423.67      (7.7%)    0.9% ( -12% -   16%) 0.679
                    OrHighNotLow      665.70      (7.9%)      672.36      (7.2%)    1.0% ( -13% -   17%) 0.675
                    OrNotHighLow      698.46      (3.5%)      706.62      (8.8%)    1.2% ( -10% -   13%) 0.581
                    OrNotHighMed      586.16      (6.9%)      593.42     (10.1%)    1.2% ( -14% -   19%) 0.651
     BrowseRandomLabelSSDVFacets       10.15      (3.3%)       17.13      (8.3%)   68.8% (  55% -   83%) 0.000
           BrowseMonthSSDVFacets       15.36      (3.6%)       32.91     (19.2%)  114.3% (  88% -  142%) 0.000
       BrowseDayOfYearSSDVFacets       14.02      (3.1%)       32.86     (16.2%)  134.4% ( 111% -  158%) 0.000

This is something explainable now... I'll try some other patches to try to solve this but i'd say so sorry again for the noise here! (Thanks god you have not spent time reviewing PR !)

asfimport commented 2 years ago

Robert Muir (@rmuir) (migrated from JIRA)

so it speeds up ordinals (SSDV) but maybe has some problems for numeric doc values (taxonomy).

Maybe, one idea is we could try using the new block compression just for ordinals as a start (SortedDocValues/SortedSetDocValues) ?

asfimport commented 2 years ago

Feng Guo (@gf2121) (migrated from JIRA)

In order to save reading time, I deleted some previous progress comments and try to make a final summary here.

one idea is we could try using the new block compression just for ordinals as a start

Thanks @rmuir for the suggestion! I made some optimizations in this approach and browse taxo tasks (Browse*TaxoFacets) are getting speed up too. So the benchmark is telling "dense faster sparse slower" instead of "SSDV faster Taxos slower" now. I suspect we probably did not see a SSDV regression just because we have not added reading sparse SSDV values tasks, e.g. a MedTermDaySSDVFacets}.

I've got two schemes in mind so far:

ForUtil Approach

This approach tends to make file format friendly to block decoding and decode block based on the efficient ForUtil (with SIMD opt) for each get. As a result we can get a rather delicious (130%) speed up in Browse* tasks. But we also get a slight (10%) regression in tasks that reading facets with a query (like MedTermDayTaxoFacets) since we are reading sparse values there and we need to decompress the whole 128 values block even we only need one value in that block.

Here is the code and luceneutil benchmark:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
         AndHighMedDayTaxoFacets       71.49      (2.1%)       64.72      (2.0%)   -9.5% ( -13% -   -5%) 0.000
            MedTermDayTaxoFacets       25.79      (2.6%)       24.00      (1.8%)   -6.9% ( -11% -   -2%) 0.000
        AndHighHighDayTaxoFacets       13.13      (3.4%)       12.63      (3.1%)   -3.9% ( -10% -    2%) 0.000
          OrHighMedDayTaxoFacets       13.71      (4.1%)       13.41      (4.7%)   -2.2% ( -10% -    6%) 0.118
                        PKLookup      204.87      (3.9%)      203.03      (3.6%)   -0.9% (  -8% -    6%) 0.450
                         Prefix3      113.85      (3.6%)      113.32      (4.6%)   -0.5% (  -8% -    8%) 0.724
                    HighSpanNear       25.34      (2.5%)       25.26      (3.1%)   -0.3% (  -5% -    5%) 0.714
                     LowSpanNear       55.96      (2.0%)       55.80      (2.1%)   -0.3% (  -4% -    3%) 0.658
                     MedSpanNear       56.84      (2.4%)       56.90      (2.2%)    0.1% (  -4% -    4%) 0.895
                 MedSloppyPhrase       26.57      (1.8%)       26.60      (1.9%)    0.1% (  -3% -    3%) 0.831
                HighSloppyPhrase       30.20      (3.7%)       30.24      (3.6%)    0.2% (  -6% -    7%) 0.890
                       OrHighMed       49.96      (2.1%)       50.06      (1.7%)    0.2% (  -3% -    4%) 0.742
                      AndHighMed       96.70      (2.9%)       96.95      (2.6%)    0.3% (  -5% -    5%) 0.772
             LowIntervalsOrdered       23.32      (4.6%)       23.38      (4.5%)    0.3% (  -8% -    9%) 0.856
                      OrHighHigh       38.09      (1.9%)       38.20      (1.8%)    0.3% (  -3% -    4%) 0.643
                      TermDTSort      128.55     (14.7%)      128.94     (11.6%)    0.3% ( -22% -   31%) 0.942
                          Fuzzy1       99.54      (7.1%)       99.86      (8.0%)    0.3% ( -13% -   16%) 0.893
            HighIntervalsOrdered       15.58      (2.6%)       15.65      (2.6%)    0.4% (  -4% -    5%) 0.636
                         Respell       63.96      (1.9%)       64.22      (2.3%)    0.4% (  -3% -    4%) 0.542
                   OrHighNotHigh      611.12      (5.8%)      613.85      (6.2%)    0.4% ( -10% -   13%) 0.814
             MedIntervalsOrdered       59.48      (5.2%)       59.75      (5.1%)    0.5% (  -9% -   11%) 0.780
                     AndHighHigh       58.76      (3.0%)       59.16      (3.0%)    0.7% (  -5% -    6%) 0.478
                   OrNotHighHigh      619.53      (6.0%)      623.79      (7.1%)    0.7% ( -11% -   14%) 0.740
                      HighPhrase       31.00      (2.5%)       31.26      (2.7%)    0.8% (  -4% -    6%) 0.307
                      AndHighLow      828.41      (5.9%)      835.65      (7.1%)    0.9% ( -11% -   14%) 0.672
                    OrNotHighLow      986.46      (6.8%)      995.13     (10.5%)    0.9% ( -15% -   19%) 0.752
            HighTermTitleBDVSort      110.39     (12.3%)      111.38     (11.1%)    0.9% ( -20% -   27%) 0.807
                          IntNRQ      151.29      (2.6%)      152.96      (3.5%)    1.1% (  -4% -    7%) 0.262
                         LowTerm     1876.18      (7.8%)     1897.19      (8.3%)    1.1% ( -13% -   18%) 0.660
           HighTermDayOfYearSort      108.34     (18.9%)      109.87     (17.4%)    1.4% ( -29% -   46%) 0.805
               HighTermMonthSort       65.84     (11.0%)       66.78     (11.7%)    1.4% ( -19% -   27%) 0.689
                    OrHighNotMed      770.05      (5.3%)      782.54      (8.8%)    1.6% ( -11% -   16%) 0.480
                        Wildcard      182.10      (5.5%)      185.24      (7.2%)    1.7% ( -10% -   15%) 0.394
                 LowSloppyPhrase       33.75      (6.6%)       34.35      (8.8%)    1.8% ( -12% -   18%) 0.478
                       MedPhrase      161.57      (3.8%)      164.62      (6.1%)    1.9% (  -7% -   12%) 0.242
                    OrHighNotLow      679.46      (7.2%)      693.59      (7.6%)    2.1% ( -11% -   18%) 0.374
                    OrNotHighMed      690.91      (7.4%)      706.15      (8.8%)    2.2% ( -13% -   19%) 0.390
                        HighTerm     1388.14      (6.3%)     1420.26      (7.8%)    2.3% ( -11% -   17%) 0.302
                       LowPhrase      410.16      (5.0%)      420.38      (5.0%)    2.5% (  -7% -   13%) 0.114
                       OrHighLow      479.96      (5.1%)      492.39      (5.7%)    2.6% (  -7% -   14%) 0.128
                         MedTerm     1575.41      (5.9%)     1618.88      (8.2%)    2.8% ( -10% -   17%) 0.221
                          Fuzzy2       64.75      (8.3%)       66.76      (8.3%)    3.1% ( -12% -   21%) 0.237
           BrowseMonthTaxoFacets       14.39     (12.1%)       18.58     (17.3%)   29.1% (   0% -   66%) 0.000
     BrowseRandomLabelTaxoFacets       12.01      (8.5%)       17.01     (18.2%)   41.6% (  13% -   74%) 0.000
            BrowseDateTaxoFacets       13.72     (11.2%)       19.83     (26.5%)   44.5% (   6% -   92%) 0.000
       BrowseDayOfYearTaxoFacets       13.84     (11.5%)       20.03     (27.4%)   44.8% (   5% -   94%) 0.000
     BrowseRandomLabelSSDVFacets       10.31      (2.6%)       17.72      (4.2%)   71.9% (  63% -   80%) 0.000
           BrowseMonthSSDVFacets       15.56      (3.3%)       34.58     (12.3%)  122.3% ( 103% -  142%) 0.000
       BrowseDayOfYearSSDVFacets       14.17      (2.9%)       32.91     (11.6%)  132.3% ( 114% -  151%) 0.000

Detect Warm Approach

This approach is keeping the origin file format to keep the high efficiency of reading sparse values, but try to detect a dense reading and do some block decoding for these situations. To be specific, we assume user is reading dense values if more than 80% values were required in the first block. And then we can use block decoding for following gets. The approach is limited to be used in forward reading cases and won't be as efficient as ForUtil in block decoding but it is nearly a net win.

Here is the code and luceneutil benchmark:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                          Fuzzy2       69.41     (10.7%)       67.98     (12.2%)   -2.1% ( -22% -   23%) 0.572
            MedTermDayTaxoFacets       26.19      (2.6%)       25.81      (3.0%)   -1.5% (  -6% -    4%) 0.099
          OrHighMedDayTaxoFacets       14.74      (2.6%)       14.53      (2.3%)   -1.4% (  -6% -    3%) 0.066
         AndHighMedDayTaxoFacets      103.03      (2.2%)      101.62      (2.7%)   -1.4% (  -6% -    3%) 0.080
            HighTermTitleBDVSort      104.13     (12.4%)      103.17     (11.2%)   -0.9% ( -21% -   25%) 0.805
           HighTermDayOfYearSort      112.59     (15.5%)      111.69      (9.4%)   -0.8% ( -22% -   28%) 0.844
                          Fuzzy1       58.74      (3.9%)       58.29      (4.6%)   -0.8% (  -8% -    8%) 0.566
                 LowSloppyPhrase       79.62      (2.1%)       79.32      (2.3%)   -0.4% (  -4% -    4%) 0.572
                      AndHighMed      143.05      (2.4%)      142.56      (3.3%)   -0.3% (  -5% -    5%) 0.709
                HighSloppyPhrase        6.83      (3.6%)        6.81      (3.3%)   -0.3% (  -6% -    6%) 0.765
                 MedSloppyPhrase       22.60      (3.0%)       22.54      (2.8%)   -0.3% (  -5% -    5%) 0.775
                        PKLookup      200.00      (2.9%)      199.51      (2.3%)   -0.2% (  -5% -    5%) 0.769
                        HighTerm     1125.74      (8.0%)     1123.52      (7.0%)   -0.2% ( -14% -   16%) 0.934
            HighIntervalsOrdered        8.78      (3.0%)        8.77      (2.4%)   -0.1% (  -5% -    5%) 0.875
             MedIntervalsOrdered       15.26      (3.3%)       15.24      (2.5%)   -0.1% (  -5% -    5%) 0.895
                        Wildcard       46.32      (3.7%)       46.29      (3.0%)   -0.1% (  -6% -    6%) 0.949
        AndHighHighDayTaxoFacets       14.24      (3.4%)       14.24      (3.9%)   -0.0% (  -7% -    7%) 0.997
                         LowTerm     1722.24      (9.2%)     1722.17      (7.4%)   -0.0% ( -15% -   18%) 0.999
                     AndHighHigh       83.82      (3.8%)       83.87      (4.4%)    0.1% (  -7% -    8%) 0.961
                     LowSpanNear       15.29      (2.1%)       15.31      (2.1%)    0.1% (  -3% -    4%) 0.823
                          IntNRQ      131.51      (2.1%)      131.80      (1.4%)    0.2% (  -3% -    3%) 0.698
                    OrHighNotLow      800.98     (10.3%)      802.86     (11.7%)    0.2% ( -19% -   24%) 0.946
                    HighSpanNear        6.60      (2.7%)        6.62      (2.7%)    0.3% (  -4% -    5%) 0.741
                       MedPhrase       34.65      (2.9%)       34.75      (2.6%)    0.3% (  -5% -    5%) 0.734
             LowIntervalsOrdered      109.61      (3.3%)      110.06      (3.3%)    0.4% (  -5% -    7%) 0.691
                     MedSpanNear       53.05      (1.7%)       53.28      (1.7%)    0.4% (  -2% -    3%) 0.426
                      OrHighHigh       33.33      (2.0%)       33.48      (2.1%)    0.4% (  -3% -    4%) 0.504
                       LowPhrase      542.02      (7.8%)      544.90      (6.1%)    0.5% ( -12% -   15%) 0.811
                         Respell       58.48      (2.5%)       58.83      (3.0%)    0.6% (  -4% -    6%) 0.489
               HighTermMonthSort      104.06     (11.1%)      104.71      (9.2%)    0.6% ( -17% -   23%) 0.847
                         Prefix3      189.59      (5.9%)      190.87      (6.2%)    0.7% ( -10% -   13%) 0.725
                         MedTerm     1510.36      (7.2%)     1520.89      (7.4%)    0.7% ( -13% -   16%) 0.763
                       OrHighMed       72.60      (2.6%)       73.19      (2.8%)    0.8% (  -4% -    6%) 0.341
                      TermDTSort       87.43     (17.9%)       88.58     (18.8%)    1.3% ( -30% -   46%) 0.821
                      AndHighLow      753.17      (6.4%)      765.05      (8.0%)    1.6% ( -12% -   17%) 0.490
                   OrHighNotHigh      649.16      (6.5%)      659.67      (9.4%)    1.6% ( -13% -   18%) 0.525
                    OrNotHighMed      610.85      (6.6%)      623.40      (8.8%)    2.1% ( -12% -   18%) 0.404
                    OrNotHighLow      809.36      (5.4%)      826.49     (10.4%)    2.1% ( -12% -   18%) 0.420
                   OrNotHighHigh      600.31      (5.1%)      613.87      (9.3%)    2.3% ( -11% -   17%) 0.342
                       OrHighLow      451.44      (6.0%)      462.66      (7.8%)    2.5% ( -10% -   17%) 0.260
                    OrHighNotMed      675.68      (7.6%)      693.79      (9.8%)    2.7% ( -13% -   21%) 0.332
                      HighPhrase      265.15      (6.5%)      273.05      (8.3%)    3.0% ( -11% -   19%) 0.208
           BrowseMonthTaxoFacets       14.10     (13.1%)       15.57      (7.5%)   10.4% (  -8% -   35%) 0.002
     BrowseRandomLabelSSDVFacets       10.25      (3.1%)       12.63      (4.3%)   23.1% (  15% -   31%) 0.000
     BrowseRandomLabelTaxoFacets       11.84      (8.6%)       14.74      (8.7%)   24.5% (   6% -   45%) 0.000
            BrowseDateTaxoFacets       13.50     (12.0%)       17.07     (10.2%)   26.5% (   3% -   55%) 0.000
       BrowseDayOfYearTaxoFacets       13.61     (12.3%)       17.26     (10.3%)   26.8% (   3% -   56%) 0.000
           BrowseMonthSSDVFacets       15.47      (3.6%)       19.69      (6.8%)   27.3% (  16% -   39%) 0.000
       BrowseDayOfYearSSDVFacets       14.12      (3.1%)       19.19      (6.9%)   35.9% (  25% -   47%) 0.000

I wonder which way you will prefer? or if you want to try some other ideas but having no time I'm willing to help too. Any feedback will be welcome :)

asfimport commented 2 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I really prefer the first patch for simplicity. it is fine if sparse performance loses 10% here. We speed up the slower, dense case in return. And dense case already took a big performance hit when sparse case was introduced, so it is fair.

asfimport commented 2 years ago

Feng Guo (@gf2121) (migrated from JIRA)

OK! I've prepared the PR for first patch ready for a review now, please help take a look when you have free time, Thanks @rmuir!