apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.43k stars 954 forks source link

[core] Drop stats in manifest file reading #4534

Closed JingsongLi closed 2 days ago

JingsongLi commented 6 days ago

Purpose

We planned to remove the statistical information when reading the file, but this optimization was hindered by the need for statistical information later for whole bucket filter.

In this PR, we refactor this logical to drop stats and compute filter selected result, in this way, we can keep whole bucket filter.

Tests

API and Format

Documentation

wwj6591812 commented 2 days ago

Very thanks for @JingsongLi do this work! Our company needs this feature now!

wwj6591812 commented 2 days ago

+1