apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.43k stars 958 forks source link

[Doc] Clarify full-compaction changelog integrality #4551

Open zhongyujiang opened 5 days ago

zhongyujiang commented 5 days ago

Search before asking

Motivation

Currently, the doc of full compaction changelog producer states that "Full compaction changelog producer can produce complete changelog for any type of source", however, when full-compaction.delta-commits is greater than 1, the intermediate changes across multiple snapshots will be ignored.

Iceberg CDC refers to this as net changes, and Snowflake refers to this as Minimum-delta changes, both differ from a "complete" changelog. So I think this also worth clarifying in the Paimon doc, because we usually consider net changes and complete changes to be different.

Solution

I think we should clarify that the full compaction changelog producer will only output complete changes when full-compaction.delta-commits is set to 1; when full-compaction.delta-commits is set to a value greater than 1, intermediate changes across the serveral delta snapshots will be ignored.

cc @JingsongLi What do you think?

Anything else?

No response

Are you willing to submit a PR?