GreptimeTeam / greptimedb

An open-source, cloud-native, unified time series database for metrics, logs and events with SQL/PromQL supported. Available on GreptimeCloud.
https://greptime.com/
Apache License 2.0
4.39k stars 317 forks source link

Support bloom filter when reading/writing parquet files #1830

Open v0y4g3r opened 1 year ago

v0y4g3r commented 1 year ago

What type of enhancement is this?

Performance

What does the enhancement do?

ParquetWriter already supports bloom filter encoding, but we have to apply query clauses to bloom filters during table scan.

Once we can build external index file, we may also switch to xor filter and it's rust implementation for better performance.

killme2008 commented 1 year ago

@v0y4g3r Any progress?

killme2008 commented 11 months ago

@v0y4g3r What's the plan for this issue? I am not sure if we still need it.

evenyag commented 11 months ago

IMO, we should do some benchmarks to compare with the inverted index later as parquet already supports it.