apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.1k stars 834 forks source link

[core] Skip push down partition filter to file reader #3601

Closed Zouxxyy closed 1 week ago

Zouxxyy commented 1 week ago

Purpose

Skip push down partition filter to file reader for the reasons:

  1. The partition filter has been applied to the splits generation, so pushing it to file has little effect.
  2. In some scenarios, the data file may not contain partition fields (such as migrate table, or our internal implementation that does not write partition fields). Then, if we push down the partition filter to the parquet reader, we will obtain any no. Because parquet will treat non-existent fields ​​as null, and we push down is not null.

Tests

API and Format

Documentation