apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.44k stars 959 forks source link

[Feature] Introduces savepoint mechanism of Paimon #748

Closed JingsongLi closed 1 year ago

JingsongLi commented 1 year ago

Search before asking

Motivation

Disaster Recovery is very much mission critical for any software. Especially when it comes to data systems, the impact could be very serious leading to delay in business decisions or even wrong business decisions at times. Paimon could introduce savepoint mechanism to assist users in recovering data from a previous state.

As the name suggest, "savepoint" saves the table as of the snapshot, so that it lets you restore the table to this savepoint at a later point in snapshot if need be. Care is taken to ensure cleaner will not clean up any files that are savepointed. On similar lines, savepoint cannot be triggered on a snapshot that is already cleaned up. In simpler terms, this is synonymous to taking a backup, just that we don't make a new copy of the table, but just save the state of the table elegantly so that we can restore it later when in need.

Solution

No response

Anything else?

No response

Are you willing to submit a PR?

schnappi17 commented 1 year ago

@JingsongLi Please assign this task to me, thank you~

JingsongLi commented 1 year ago

Thanks @schnappi17 , this may need a PIP. https://cwiki.apache.org/confluence/display/PAIMON/Paimon+Improvement+Proposals

schnappi17 commented 1 year ago

Got it. @JingsongLi

FangYongs commented 1 year ago

Use tag instead of savepoint