GoogleCloudPlatform / flink-on-k8s-operator

[DEPRECATED] Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.
Apache License 2.0
658 stars 266 forks source link

How to start job after jobmanager fails for whatever reason? #447

Open frenkdefrog opened 3 years ago

frenkdefrog commented 3 years ago

Hi folks, I am still getting familiar with Flink-operator, and I would like to ask for your help with the following question. After starting a new Flink Job Cluster a new pod comes up, which submits the job for the jobmanager. After a while, it goes into the completed state, and the job keeps running. In my use case there is no persistent volume in my cluster, there is no need to set up any savepoints. All that I would like to achieve is to make sure that the job will be started again whenever the jobmanager fails. I don't need to restore anything, just run the job again. Is there any possibility for this other than set up persistent volume and savepointsdir with autosavepoints?

stolendog commented 3 years ago

set HA properties for JobManager in flink.conf https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/ and it should restart JobManager state from the remote storage