apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.43k stars 956 forks source link

[core] Fast return if rollback verion equals lastest version #4467

Closed askwang closed 2 weeks ago

askwang commented 2 weeks ago

Purpose

If the rollback snapshot version is equal to the lastest snapshot version, we should fast return.

Tests

API and Format

Documentation

wwj6591812 commented 2 weeks ago

A question: Assuming that when the user executes the rollback procedure with snapshot = 1, the latestSnapshot Id of the Paimon table is 1, and before the rollback procedure ends, the latestSnapshot Id of the Paimon table changes to 2. After you apply this patch, in the end, the rollback procedure fast return, so the RollbackHelper#cleanLargerThan will not be called. Is this the result that the user wants?

askwang commented 2 weeks ago

A question: Assuming that when the user executes the rollback procedure with snapshot = 1, the latestSnapshot Id of the Paimon table is 1, and before the rollback procedure ends, the latestSnapshot Id of the Paimon table changes to 2. After you apply this patch, in the end, the rollback procedure fast return, so the RollbackHelper#cleanLargerThan will not be called. Is this the result that the user wants?

In this case, the result is not what the user wants. But although there does not fast return, this case may also occur. For example, the rollback snapshot is 1 and the latest snapshot of table is also 1, before rollback ends the lastest snapshot of table changes to 2 or 3, the rollback logic gets the lastest shotshot is diffcult to keep always the newest, this will cause some new snapshots not to be processed.

JingsongLi commented 2 weeks ago

Hi @askwang , what is this PR aims to? just for performance?

askwang commented 2 weeks ago

Hi @askwang , what is this PR aims to? just for performance?

yes, juse fast return, no need to execute RollbackHelper#cleanLargerThan.

JingsongLi commented 2 weeks ago

Hi @askwang , what is this PR aims to? just for performance?

yes, juse fast return, no need to execute RollbackHelper#cleanLargerThan.

Do we need to care about the speed of this operation? I think it should be very fast even no optimization.