apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.1k stars 834 forks source link

[core] Fix read row position with parquet when filter push down #3634

Closed Zouxxyy closed 3 days ago

Zouxxyy commented 3 days ago

Purpose

Parquet write & expose row index in block metadata since 1.12.3 (paimon use 1.13.1) see https://issues.apache.org/jira/browse/PARQUET-2117, we can use it to computer row position when blocks are filtered. When row index was not found in block metadata, skip applying filter in order to get the correct row position

Tests

API and Format

Documentation

JingsongLi commented 3 days ago

+1