apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.11k stars 835 forks source link

[Bug] Flink bounded source with checkpoint missing last snapshot commit #3345

Open eric666666 opened 1 month ago

eric666666 commented 1 month ago

Search before asking

Paimon version

Paimon 0.9 snapshot

Compute Engine

Flink

Minimal reproduce step

If use flink bounded source which like jdbc or other, with checkpoint enabled,flink do not commit last snapshot commit. I review the source code find org.apache.paimon.flink.sink.CommitterOperator caused it. image I do not know why their should skip commit snapshot if flink streaming mode enable checkpoint.

What doesn't meet your expectations?

Missing last commit

Anything else?

No response

Are you willing to submit a PR?

eric666666 commented 1 month ago

@tsreaper @JingsongLi Please have a look.

eric666666 commented 1 month ago

@tsreaper @JingsongLi Please have a look.

I remove the condition, to fix the problem. But I look at the code, the conditions here seem to be intended, so I don't know if I should do this image

JingsongLi commented 1 month ago

Flink 1.14?

eric666666 commented 1 month ago

Flink 1.14?

No, Flink 1.18.1

LinMingQiang commented 1 month ago

As I know, flink 15+ version has support checkpointing with finished tasks (https://flink.apache.org/2022/07/11/flip-147-support-checkpoints-after-tasks-finished-part-one/#support-checkpointing-with-finished-tasks). so commit will be executed once in notifyCheckpointComplete. And this was true in my tests.