apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.43k stars 956 forks source link

[Feature] support setting uid suffix for source/sink to improve state compatibility when jobGraph changes. #4424

Closed liming30 closed 2 weeks ago

liming30 commented 3 weeks ago

Search before asking

Motivation

In Flink jobs, the source and sink of paimon are both stateful. In our business scenarios, we may modify the parallelism, add sources, and replace source/sink tables, which can easily lead to problems where the state cannot be restored, such as Cannot map checkpoint/savepoint state for operator xxx to the new program.

For example, the following two operations cannot be restored from the Savepoint:

  1. insert into table_A (select f1 from table_B) -> insert in table_A (select f1 from table_B union all select f1 from table_C)

  2. insert into table_A select f1 from table_B -> insert into table_C select f1 from table_B

Therefore I want to set uid for source/sink operators to improve state compatibility when restoring from savepoint.

Solution

Set uid for source/sink operators.

Anything else?

No response

Are you willing to submit a PR?