GoogleCloudPlatform / flink-on-k8s-operator

[DEPRECATED] Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.
Apache License 2.0
658 stars 266 forks source link

How to update the webhook certificate #356

Closed elanv closed 3 years ago

elanv commented 3 years ago

Currently, cert-manager is not supported, so if the certificate expires, you need to manually update the webhook certificate. It would be helpful if a guide for updating the certificate was presented.

hongyegong commented 3 years ago

Can you provide more information about certificate expiration issue? Like how long did you see it expires and what's the behavior after in your test?

A potential fix is to refresh certificate at certain interval but let's make sure we are solving the problem in a correct way.

guruprasathT commented 3 years ago

Sorry to interrupt in between. Yes it is expired after every 30 days.

https://flink-operator-webhook-service.importer.svc:443/mutate-flinkoperator-k8s-io-v1beta1-flinkcluster?timeout=30s: x509: certificate has expired or is not yet valid

And I believe it is due to openssl default expiration is set to 30 days.

https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/helm-chart/flink-operator/templates/generate-cert.yaml#L35-L38

i referred to here

elanv commented 3 years ago

I couldn't create new FlinkCluster CR or update already created CR because default or validate webhook doesn't work when the certificate expires.

kinderyj commented 3 years ago

One of the way to update the webhook certificate: kubectl get job cert-job -n flink-operator-system -oyaml > cert-job.yaml kubectl delete job cert-job -n flink-operator-system kubectl apply -f cert-job.yaml (delete the controller-id labels in the cert-job.yaml if needed)

One of the way to change the default expires days(30): https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/4f421e7e7b289973b2e50b123c62436abb2c0109/helm-chart/flink-operator/templates/generate-cert.yaml#L38

    | openssl x509 -req -CA ca.crt -CAkey ca.key -CAcreateserial -out ${tmpdir}/server-cert.pem
    > change to: 
    | openssl x509 -days 3650 -req -CA ca.crt -CAkey ca.key -CAcreateserial -out ${tmpdir}/server-cert.pem
functicons commented 3 years ago

+1 to make the expire days longer.

brunomrpx commented 3 years ago

For those who don't use Helm Chart to install the operator, we solved this by deleting the webhook-server-cert secret:

$ kubectl delete secret webhook-server-cert -n flink-operator-system

And running a re-deploy:

$ make deploy

We've tried to run just the make webhook-cert, but the following error happens when deploying a job:

x509: certificate signed by unknown authority