apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.43k stars 956 forks source link

[Feature] Add empty commit options to support some situation #4422

Open zlzhang0122 opened 3 weeks ago

zlzhang0122 commented 3 weeks ago

Search before asking

Motivation

In some special cases, there maybe exists a situation that the CommitMessage is not empty but all the table files in it are empty, and if the job failover because of conflict, the job will be unrecoverable since the value of numCommitted will be greater than 1, so the job will failover infinite and can't be restore.

The exception is "This exception is intentionally thrown after committing the restored checkpoints. By restarting the job we hope that writers can start writing based on these new commits".

Solution

We can add an core options such as 'commit.ignore-empty-commit' to support commit an empty snapshot, and then we can restore the job normally.

Anything else?

No response

Are you willing to submit a PR?