Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
2.44k
stars
959
forks
source link
[core] Support bsi file index #4464
Closed
Tan-JiaLiang closed 2 weeks ago
Purpose
Support bit-slice bitmap index, use for accelerate numeric range query and it can be combined with bitmap index.
The comparison algorithm refers to the bsi module of roaringbitmap and the paper Improved Query Performance with Variant Indexes in Algorithm 4.2. Also, this blog can help us understand BSI better.
Tests
org.apache.paimon.fileindex.bsi.BitSliceIndexBitmapFileIndexTest org.apache.paimon.utils.BitSliceIndexRoaringBitmapTest org.apache.paimon.table.AppendOnlyFileStoreTableTest#testBSIAndBitmapIndexInMemory org.apache.paimon.table.AppendOnlyFileStoreTableTest#testBSIAndBitmapIndexInDisk org.apache.paimon.spark.SparkFileIndexITCase#testReadWriteTableWithBitSliceIndex
API and Format
Nothing.
Documentation
docs/content/concepts/spec/fileindex.md