Altinity / ClickHouse

Altinity Stable® Builds of ClickHouse®
https://github.com/Altinity/ClickHouse/releases
Apache License 2.0
29 stars 4 forks source link

Include bloom filter statistics when reading parquet metadata with ClickHouse #490

Open Selfeer opened 1 month ago

Selfeer commented 1 month ago

Describe the new feature

We need a way to determine if the bloom filter is applied or not on a parquet file when inspecting the parquet metadata with ClickHouse via SELECT * FROM file('output.parquet', ParquetMetadata). Currently there is no mention of bloom_filter_offset when reading from a parquet with ClickHouse.

Use case

A way to check if the bloom filter is applied or not on the parquet file and have it as one of the checks for QA directly with ClickHouse without relying on 3rd party tools like parquet-tools.

arthurpassos commented 1 month ago

This might be useful: https://github.com/apache/iceberg/issues/9898#issuecomment-2375857223