Restarting a Job Without a Savepoint

GoogleCloudPlatform / flink-on-k8s-operator

[DEPRECATED] Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.

Apache License 2.0

658 stars 266 forks source link

Restarting a Job Without a Savepoint #471

Open benkusak opened 2 years ago

benkusak commented 2 years ago

Greetings Folks, I am relatively new to this community and am getting my head wrapped around the operator. One of the problems I am currently facing is how to restart a Job after a catastrophic failure, ignoring all prior savepoints (essentially redeploying the application as if it were a fresh deployment). I think I am missing something obvious here. When restarting the application I am tainting the flinkproperties of the CRD to trigger a redeployment, however it is picking up the last savepoint from a previous application upgrade instead of redeploying the application without a savepoint. Any and all pointers much appreciated!

pjthepooh commented 2 years ago

I couldn't find this feature in user guide or any doc neither. Even with fromSavepoint unspecified or set to empty, the job still pick up from the previous checkpoint.

guruprasathT commented 2 years ago

Hi, If the flink savepoint was failed to create. And if the checkpoint still persists then set the checkpoint path in the fromSavepoint and redo the deployment again so the flink job will be restored from the last checkpoint. (OR) You can delete the resource flinkCluster for the particular job and redeploy the flinkCluster and the other deployments and so here the jobs will be started without any checkpoint or savepoint set.

But please be aware this procedure will be like deploying the new flink jobCluster.

sv3ndk commented 2 years ago

HI @benkusak , As far as I can tell, this Flink operator is no longer maintained, the latest version of Flink it's officially supporting is 1.11. This fork seems more and more active: https://github.com/spotify/flink-on-k8s-operator