apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.43k stars 955 forks source link

[Feature] Support deleting rows in partial update of specific sequence group #4514

Closed liyubin117 closed 1 week ago

liyubin117 commented 1 week ago

Search before asking

Motivation

For retract type data in partial update, Paimon have provided three solutions, These three solutions are mutually exclusive and do not take effect at the same time.

  1. ignoreDelete to ignore retraction
  2. leave fields in sequence group blank
  3. removeRecordOnDelete to delete the entire record.

we currently use the partialUpdate+sequenceGroup solution to replace the left join to improve performance, but the delete of the main upstream will not delete the final result, resulting in deviation from the correct result.

Solution

Modify PartialUpdateMergeFunction

Anything else?

No response

Are you willing to submit a PR?