apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.43k stars 955 forks source link

[parquet] Support to enable parquet bloomfilter #4479

Closed Aitozi closed 1 week ago

Aitozi commented 2 weeks ago

Purpose

This pr is meant to support to enable parquet bloomfilter.

// Enable the bloom filter for all columns
conf.set("parquet.bloom.filter.enabled", true);
// Disable the bloom filter for the column 'column.path'
conf.set("parquet.bloom.filter.enabled#column.path", false);
// The bloom filter will be enabled for all columns except 'column.path'

Linked issue: close #xxx

Tests

API and Format

Documentation