GoogleCloudPlatform / flink-on-k8s-operator

[DEPRECATED] Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.
Apache License 2.0
658 stars 266 forks source link

Flink Cluster installation is failing with error failed calling webhook #377

Open vinaykw opened 3 years ago

vinaykw commented 3 years ago

Kept the flink operator and flink cluster running together for more than 35 days. I am using helm for installing both flink operator and flink session cluster. When I uninstalled flink session cluster and tried to reinstall the flink session cluster I am seeing the below error:

Error: Internal error occurred: failed calling webhook "mflinkcluster.flinkoperator.k8s.io": Post https://flink-operator-webhook-service.reporting-flink-operator.svc:443/mutate-flinkoperator-k8s-io-v1beta1-flinkcluster?timeout=30s: x509: certificate has expired or is not yet valid

Please help me to resolve this issue

guruprasathT commented 3 years ago

@vinaykw please check here for workaround https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/issues/356

sumchak1 commented 3 years ago

based on https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/issues/356, we have tried all the mentioned steps but still the flink session cluster installation is failed.
First we tried with the below steps and it didn't helped

        kubectl get job cert-job -n flink-operator-system -oyaml > cert-job.yaml
        kubectl delete job cert-job -n flink-operator-system
        kubectl apply -f cert-job.yaml

Again tried by editing the config-map to change the default expires days and it also didn't helped us | openssl x509 -req -CA ca.crt -CAkey ca.key -CAcreateserial -out ${tmpdir}/server-cert.pem

change to: | openssl x509 -days 3650 -req -CA ca.crt -CAkey ca.key -CAcreateserial -out ${tmpdir}/server-cert.pem

k delete -f config-map-up1.yaml -n flink-operator-system
configmap "cert-configmap" deleted

k apply -f config-map-up1.yaml -n flink-operator-system
configmap/cert-configmap created

kubectl get pods -n flink-operator-system
NAME                                                 READY   STATUS    RESTARTS   AGE
flink-operator-controller-manager-848b69b444-8v9l5   2/2     Running   0          43m

k apply -f cert-job-1.yaml -n flink-operator-system
job.batch/cert-job created

kubectl get pods -n flink-operator-system
NAME                                                 READY   STATUS      RESTARTS   AGE
cert-job-lgxzt                                       0/1     Completed   0          7s
flink-operator-controller-manager-848b69b444-8v9l5   2/2     Running     0          44m

 kubectl apply -f config/samples/flinkoperator_v1beta1_flinksessioncluster.yaml
Error from server (InternalError): error when creating "config/samples/flinkoperator_v1beta1_flinksessioncluster.yaml": Internal error occurred: failed calling webhook "mflinkcluster.flinkoperator.k8s.io": Post "https://flink-operator-webhook-service.flink-operator-system.svc:443/mutate-flinkoperator-k8s-io-v1beta1-flinkcluster?timeout=30s": x509: certificate relies on legacy Common Name field, use SANs or temporarily enable Common Name matching with GODEBUG=x509ignoreCN=0