GoogleCloudPlatform / flink-on-k8s-operator

[DEPRECATED] Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.
Apache License 2.0
658 stars 266 forks source link

How to upgrade operator to new version without any impact #355

Open yanghui16355 opened 3 years ago

yanghui16355 commented 3 years ago

I have a question about how to upgrade the operator to a new version without any impact on running Flink app deployed by operator. There are couple options I think can do:

  1. Undeploy the current operator and deploy a new one. However, I am not sure what will happen for the Flink app deployed by operator when operator is missing for a while.
  2. Directly deploy the new version of operator with the "make deploy" tool. However, I am not sure if it will work or not?

@functicons Please provide the guidance for the upgrade, thanks!

Hui

yanghui16355 commented 3 years ago

I tried two options I mentioned above:

  1. when undeploy the operator, all flink apps deployed by operator will be stoped and deleted.

  2. I tried directly deploy a new version and found it will create a new set of pods but with image pull failure

Withe the tests I have, I want to check about the operator upgrade process, and what will happen for Flink apps when operator crashed.

Thanks,

Hui

functicons commented 3 years ago

I think there are 2 major cases.

1) If the CRD is not changed (or compatible), only the some internal implementations (e.g., bug fixes) are changed in the controller code, then you can simply update (or recreate) the operator deployment (pods) only, but don't delete the CRD. In this case, the Flink app will continue to run even without the operator, and the operator will take control after the upgrade.

2) If the CRD is changed and not compatible anymore, you need to recreate the CRD and the operator deployment (pods). In this case, you need to take a savepoint for your Flink app, stop it, and recreate the app from the savepoint after the operator upgrade. I don't see a way that could allow the Flink app to keep running during the upgrade.

We don't currently have a make command or script to facilitate the upgrade process. Will consider adding one. Thanks for the question!

yanghui16355 commented 3 years ago

@functicons thanks for your reply! Here is the further question that how to update the operator deployment(pods) with new version without change the CRD? It will be much better that you can provide the make command and script for the upgrade process : )

functicons commented 3 years ago

how to update the operator deployment(pods) with new version without change the CRD?

That would be similar to the deploy target here https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/f4ca46ac569ddce3512f8d4103cbf28cf8fba24d/Makefile#L123

but remove install and webhook-cert. If we add a target deploy-controller, that would be:

deploy-controller: config/default/manager_image_patch.yaml build-overlay
    sed -e 's#image: .*#image: '"$(IMG)"'#' ./config/deploy/manager_image_patch.template >./config/deploy/manager_image_patch.yaml
    @echo "Getting webhook server certificate"
        ...