Closed saLeox closed 1 year ago
It's a very good feature to trigger savepoint manually when the job is running.
However, judging whether the data in the savepoint is legal seems be complex for the platform, which depends on the complex state of external components.
I prefer to add an interface so that users can manually trigger savepoint. what do you think ? CC @saLeox @wolfboys @1996fanrui
Hi @saLeox , thanks for the proposal.
I have a question about the background. Why users need to restore job from the previous savepoint? Why don't restore from checkpoint?
It's a very good feature to trigger savepoint manually when the job is running.
However, judging whether the data in the savepoint is legal seems be complex for the platform, which depends on the complex state of external components.
I prefer to add an interface so that users can manually trigger savepoint. what do you think ? CC @saLeox @wolfboys @1996fanrui
It's a great feature. looking forward for your contribution
Hi @saLeox , thanks for the proposal.
I have a question about the background. Why users need to restore job from the previous savepoint? Why don't restore from checkpoint?
hi fanrui: thanks for your feedback, At the earliest time in streampark, the state can only be restored from the latest savepoint, Later, it was found that many users would feedback that the platform does not support restoring status from the historical cp/sp. Users do have this requirement. For streampark platform, we should provide this capability. We provide more free choices for users to decide how to use
many users would feedback that the platform does not support restoring status from the historical cp/sp. Users do have this requirement.
Thanks for your feedback. As I understand, if users set state.checkpoints.num-retained > 1, StreamPark can restore job from historical checkpoint, right?
many users would feedback that the platform does not support restoring status from the historical cp/sp. Users do have this requirement.
Thanks for your feedback. As I understand, if users set state.checkpoints.num-retained > 1, StreamPark can restore job from historical checkpoint, right?
Yes, you are right. This function is provided in flink, flink job will automatically trigger, This does't conflict with the users manually trigger a savepoint, We need to support that users can trigger savepoint manually
@RocMarshal @1996fanrui @wolfboys Hi, thanks for interest on this feature. The intention is more for the data backfill. Assuming that there is logic change or bug fix in flink job, user has to rerun the data from a given time point. It will be more useful for them to choose the savepoint by date and rerun the job.
And according to Flink community, it's suggested to distinguish the savepoint and checkpoint from the prospect of semantics. So better for us to go on with savepoint to support such use case. pls kindly refer to FLIP-47: Checkpoints vs. Savepoints
Great thanks for your willingness to contribute to this feature @RocMarshal And further more, it will be more ideal to schedule the savepoint trigger action and make it automated.
hi saLeox:
thanks for your clarification, as I understand: we only need to support trigger cp/sp , see this pr, As for when to schedule and trigger, we don't care, we just need to provide a triggere cp/sp interface, and we don't need to do anything else, Your feedback is welcome
Search before asking
Description
Currently there is a function to allow user to create savepoint when stopping Flink job. However, there is another normal case that, the latest savepoint is not valid any longer since it exceed the Kafka topic TTL(time to live, 7 days by default), and it means it is not feasible for user to backfill the data from the latest savepoint, unless there is a valid savepoint within TTL. Thus streampark can provide a scheduler to trigger savepoint save action based on cron express from user, and also remove the invalid savepoint according to the Kafka topic TTL setting.
Usage Scenario
When user wants to generate savepoint for their Flink jobs periodically for the ease of rerun and backfill.
Related issues
No response
Are you willing to submit a PR?
Code of Conduct