Explore within-block skipping for postings

apache / lucene

Apache Lucene open-source search software

Apache License 2.0

2.58k stars 1.01k forks source link

Lucene is also doing the "accumulate docid deltas into the absolute docid" too in this loop, but I guess Tantivy does this separately somehow?

I believe Tantivy does the same, except that it can take advantage of SIMD to accumulate docid deltas into the absolute docid (if it did not accumulate deltas up-front, it could not run a branchless binary search later on). I tried to look into whether we can do the same with Panama, but last time I checked it doesn't give ways to use _mm_slli_si128, which prevents us from making a faster prefix sum through vectorization: https://github.com/jpountz/vectorized-prefix-sum.

For reference, there is also this old @mkhludnev idea about encoding dense postings lists as bitsets, which would naturally help with skipping: #6116 (or can we do it on a per-block basis?). And more generally, there are some formats that are better at skipping within blocks like Elias-Fano.

apache / lucene

Explore within-block skipping for postings #12486

Description