apache / incubator-streampark

Make stream processing easier! Easy-to-use streaming application development framework and operation platform.
https://streampark.apache.org/
Apache License 2.0
3.91k stars 1.01k forks source link

[Feature] Schedule job to trigger savepoint #2192

Closed saLeox closed 1 year ago

saLeox commented 1 year ago

Search before asking

Description

Currently there is a function to allow user to create savepoint when stopping Flink job. However, there is another normal case that, the latest savepoint is not valid any longer since it exceed the Kafka topic TTL(time to live, 7 days by default), and it means it is not feasible for user to backfill the data from the latest savepoint, unless there is a valid savepoint within TTL. Thus streampark can provide a scheduler to trigger savepoint save action based on cron express from user, and also remove the invalid savepoint according to the Kafka topic TTL setting.

Usage Scenario

When user wants to generate savepoint for their Flink jobs periodically for the ease of rerun and backfill.

Related issues

No response

Are you willing to submit a PR?

Code of Conduct

RocMarshal commented 1 year ago

It's a very good feature to trigger savepoint manually when the job is running.

However, judging whether the data in the savepoint is legal seems be complex for the platform, which depends on the complex state of external components.

I prefer to add an interface so that users can manually trigger savepoint. what do you think ? CC @saLeox @wolfboys @1996fanrui

1996fanrui commented 1 year ago

Hi @saLeox , thanks for the proposal.

I have a question about the background. Why users need to restore job from the previous savepoint? Why don't restore from checkpoint?

wolfboys commented 1 year ago

It's a very good feature to trigger savepoint manually when the job is running.

However, judging whether the data in the savepoint is legal seems be complex for the platform, which depends on the complex state of external components.

I prefer to add an interface so that users can manually trigger savepoint. what do you think ? CC @saLeox @wolfboys @1996fanrui

It's a great feature. looking forward for your contribution

wolfboys commented 1 year ago

Hi @saLeox , thanks for the proposal.

I have a question about the background. Why users need to restore job from the previous savepoint? Why don't restore from checkpoint?

hi fanrui: thanks for your feedback, At the earliest time in streampark, the state can only be restored from the latest savepoint, Later, it was found that many users would feedback that the platform does not support restoring status from the historical cp/sp. Users do have this requirement. For streampark platform, we should provide this capability. We provide more free choices for users to decide how to use

1996fanrui commented 1 year ago

many users would feedback that the platform does not support restoring status from the historical cp/sp. Users do have this requirement.

Thanks for your feedback. As I understand, if users set state.checkpoints.num-retained > 1, StreamPark can restore job from historical checkpoint, right?

wolfboys commented 1 year ago

many users would feedback that the platform does not support restoring status from the historical cp/sp. Users do have this requirement.

Thanks for your feedback. As I understand, if users set state.checkpoints.num-retained > 1, StreamPark can restore job from historical checkpoint, right?

Yes, you are right. This function is provided in flink, flink job will automatically trigger, This does't conflict with the users manually trigger a savepoint, We need to support that users can trigger savepoint manually

saLeox commented 1 year ago

@RocMarshal @1996fanrui @wolfboys Hi, thanks for interest on this feature. The intention is more for the data backfill. Assuming that there is logic change or bug fix in flink job, user has to rerun the data from a given time point. It will be more useful for them to choose the savepoint by date and rerun the job.

And according to Flink community, it's suggested to distinguish the savepoint and checkpoint from the prospect of semantics. So better for us to go on with savepoint to support such use case. pls kindly refer to FLIP-47: Checkpoints vs. Savepoints

Great thanks for your willingness to contribute to this feature @RocMarshal And further more, it will be more ideal to schedule the savepoint trigger action and make it automated.

wolfboys commented 1 year ago

hi saLeox:

thanks for your clarification, as I understand: we only need to support trigger cp/sp , see this pr, As for when to schedule and trigger, we don't care, we just need to provide a triggere cp/sp interface, and we don't need to do anything else, Your feedback is welcome