apache / parquet-java

Apache Parquet Java
https://parquet.apache.org/
Apache License 2.0
2.65k stars 1.41k forks source link

Supports disable statistics for specified columns #2988

Closed ConeyLiu closed 2 months ago

ConeyLiu commented 3 months ago

Describe the enhancement requested

Column statistics are mostly used for accelerating row filtering, while not all the columns are used in a query predicate. For example, the binary columns are not often used in filtering. However, they need many CPU/memory resources for the statistics calculate and store. We could add the configuration to support enabling/disabling the specified column statistics. This could reduce the footer size as well.

Component(s)

No response