apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.3k stars 918 forks source link

[Feature] Conflicts during Compactions Better coping strategies,Instead of restarting #3143

Open yangtao0626 opened 5 months ago

yangtao0626 commented 5 months ago

Search before asking

Motivation

We have a flink task,only one commit user ,and set 'changelog-producer' = 'lookup', and other compact propreties are default value. flink checkpoint.interval =20s, and max-concurrent-checkpoints=3, restarts are triggered frequently throughout the day.

and the error message looks like the following: Caused by: java.lang.IllegalStateException: Trying to delete file xxxx.orc which is not previously added. Manifest might be corrupted.

Solution

We want to write a bucket in a single task, and we can adopt different coping strategies

  1. Give up this compact, he did not have a great impact, but the small file merge is slow 2.Ignore the delete conflict in this compact and commit directly

Anything else?

I don't know if the community is interested in doing these things, or if there are any problems with them?

Are you willing to submit a PR?

JingsongLi commented 4 months ago

The key problem is that the state inside the WRITER has changed, and if you just IGNORE this conflict, then every subsequent COMMIT will conflict.