Open yanghui16355 opened 3 years ago
I tried two options I mentioned above:
when undeploy the operator, all flink apps deployed by operator will be stoped and deleted.
I tried directly deploy a new version and found it will create a new set of pods but with image pull failure
Withe the tests I have, I want to check about the operator upgrade process, and what will happen for Flink apps when operator crashed.
Thanks,
Hui
I think there are 2 major cases.
1) If the CRD is not changed (or compatible), only the some internal implementations (e.g., bug fixes) are changed in the controller code, then you can simply update (or recreate) the operator deployment (pods) only, but don't delete the CRD. In this case, the Flink app will continue to run even without the operator, and the operator will take control after the upgrade.
2) If the CRD is changed and not compatible anymore, you need to recreate the CRD and the operator deployment (pods). In this case, you need to take a savepoint for your Flink app, stop it, and recreate the app from the savepoint after the operator upgrade. I don't see a way that could allow the Flink app to keep running during the upgrade.
We don't currently have a make
command or script to facilitate the upgrade process. Will consider adding one. Thanks for the question!
@functicons thanks for your reply! Here is the further question that how to update the operator deployment(pods) with new version without change the CRD? It will be much better that you can provide the make command and script for the upgrade process : )
how to update the operator deployment(pods) with new version without change the CRD?
That would be similar to the deploy
target here https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/f4ca46ac569ddce3512f8d4103cbf28cf8fba24d/Makefile#L123
but remove install
and webhook-cert
. If we add a target deploy-controller
, that would be:
deploy-controller: config/default/manager_image_patch.yaml build-overlay
sed -e 's#image: .*#image: '"$(IMG)"'#' ./config/deploy/manager_image_patch.template >./config/deploy/manager_image_patch.yaml
@echo "Getting webhook server certificate"
...
I have a question about how to upgrade the operator to a new version without any impact on running Flink app deployed by operator. There are couple options I think can do:
@functicons Please provide the guidance for the upgrade, thanks!
Hui