Tony-X / search-benchmark-game

https://github.com/quickwit-oss/search-benchmark-game
https://tantivy-search.github.io/bench/
MIT License
10 stars 6 forks source link

Benchmark Lucene with a single level skip list #44

Open mikemccand opened 1 year ago

mikemccand commented 1 year ago

@Tony-X and I were chatting about the difference in skipping in Tantivy (single level, fast binary search) and Lucene (multi-level, sort of complex implementation) and wondering whether that is causing the AndHighLow performance difference.

We could test this by disabling Lucene's multi-level skip list at indexing time, or, rather, enabling it but configuring the level skip multiplier so high (Integer.MAX_VALUE?) so that it only ever uses a single skip level.

Tony-X commented 1 year ago

I looked into the code a little bit. Seems like the MAX_SKIP_LEVELS is hardcoded

Any good idea to easily set it to 1 ? I think if we limit the level to 1 then the skip multiplier won't take effect.

jpountz commented 1 year ago

In my experience AndHighLow is the task that is the most likely to bottleneck on decoding postings. This is due to the fact that you often need to decode an entire block of postings of the High clause to check a single posting of the Low clause. I wonder if this has more to do with Tantivy's better decoding logic than with skipping.

The comparison with a single level skip list would still be interesting. Note that this will also affect block-max and, since Lucene also uses higher levels of skip lists to store score upper bounds of longer ranges of doc IDs. You will probably want to use the counting task to make the comparison more fair.

It might also be interesting to evaluate different encodings that don't have such a high up-front overhead per block such as roaring bitmaps or partitioned Elias-Fano.

Tony-X commented 1 year ago

In my experience AndHighLow is the task that is the most likely to bottleneck on decoding postings

You're absolutely right. Lucene is the clear winner if you select TOP_10_COUNT and AndHighLow in https://tony-x.github.io/search-benchmark-game/

Note that this will also affect block-max and, since Lucene also uses higher levels of skip lists to store score upper bounds of longer ranges of doc IDs. You will probably want to use the counting task to make the comparison more fair.

Tantivy also implements BMW so it holds the per-block upper bounds. This means if we reduce to single skip level for Lucene it'd be comparable

slow-J commented 1 year ago

Set MAX_SKIP_LEVELS=1 in Lucene and ran the benchmark and control on a m6g.4xlarge machine. To setup the benchmark, I ran gradlew jar in Lucene and copied the core jar to my benchmark workspace and added this line to build.gradle: implementation files("./libs/lucene-core-9.7.1-SNAPSHOT.jar")

image image image

TOP_10_COUNT has the largest average performance gain from using a single level skip list, and COUNT seems to slightly regress.

HighTerm, MedTerm and LowTerm being the much slower Lucene queries with this setup. With +52.45%, +46.50%, +30.73% avg latency respectively.

image

Attaching results.json results-skiplist-merg.json.zip

mikemccand commented 1 year ago

HighTerm, MedTerm and LowTerm being the much slower Lucene queries with this setup

This is because BMW was less efficient?

How is the performance of AndHighLow affected?

jpountz commented 1 year ago

This is because term queries check higher levels of impacts to skip more docs at once. For instance when collecting top-100 hits and after collecting 1000 blocks of postings, it's not unlikely that the best score at level 1 (8 level-0 blocks) is less than the minimum competitive score, so ImpactsDISI would then skip 8 blocks at once.