GoogleCloudPlatform / flink-on-k8s-operator

[DEPRECATED] Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.
Apache License 2.0
658 stars 266 forks source link

Webhook call failing due to bad cert #459

Open jamesclair opened 3 years ago

jamesclair commented 3 years ago

After upgrading flink operator, somehow the webhook certificate has become unusable and operator won't accept webhook connections when trying to deploy a flinkcluster CR.

environment

symptoms kube-apiserver logs:

2021-06-28 07:52:58 
{"log":"W0628 12:52:58.747318       1 dispatcher.go:182] Failed calling webhook, failing closed mflinkcluster.flinkoperator.k8s.io: failed calling webhook \"mflinkcluster.flinkoperator.k8s.io\": Post \"https://flink-operator-webhook-service.flink-operator-system.svc:443/mutate-flinkoperator-k8s-io-v1beta1-flinkcluster?timeout=30s\": x509: certificate signed by unknown authority\n","stream":"stderr","time":"2021-06-28T12:52:58.747487577Z"}

workaround Update the webhook cert by delete/recreate the flink-operator-system/cert-job. Then re-sync/apply the flinkcluster CR. source: How to update the webhook certificate · Issue #356 · GoogleCloudPlatform/flink-on-k8s-operator · GitHub

olegy2008 commented 3 years ago

Got the same after updating a helm chart (which was supposed to only update requests and limits), after update caBundle fields in validation and mutation webhooks were updated to Cg==. Had to restore CA in webhooks manually.