apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.44k stars 959 forks source link

[core] Support bsi file index #4464

Closed Tan-JiaLiang closed 2 weeks ago

Tan-JiaLiang commented 2 weeks ago

Purpose

Support bit-slice bitmap index, use for accelerate numeric range query and it can be combined with bitmap index.

The comparison algorithm refers to the bsi module of roaringbitmap and the paper Improved Query Performance with Variant Indexes in Algorithm 4.2. Also, this blog can help us understand BSI better.

Tests

org.apache.paimon.fileindex.bsi.BitSliceIndexBitmapFileIndexTest org.apache.paimon.utils.BitSliceIndexRoaringBitmapTest org.apache.paimon.table.AppendOnlyFileStoreTableTest#testBSIAndBitmapIndexInMemory org.apache.paimon.table.AppendOnlyFileStoreTableTest#testBSIAndBitmapIndexInDisk org.apache.paimon.spark.SparkFileIndexITCase#testReadWriteTableWithBitSliceIndex

API and Format

Nothing.

Documentation

docs/content/concepts/spec/fileindex.md