apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.13k stars 842 forks source link

[core] Legacy files should always drop delete rows #3661

Closed JingsongLi closed 4 days ago

JingsongLi commented 5 days ago

Purpose

Old version: Previous versions of Paimon may retain -D records at the highest level for tables with changelog producer=lookup, while Paimon's old reader will filter -D once no matter what.

New version: Paimon will not retain -D data in the top-level files in the new version, and also optimizes the Reader to enter optimization logic when reading top-level data, no longer filtering -D data.

Paimon should filter out - D for old files in the case of changelog-producer=lookup, without going through optimization logic.

Tests

API and Format

Documentation