cilium / cilium-etcd-operator

Operator to manage Cilium's etcd cluster
Apache License 2.0
26 stars 15 forks source link

unable to create Cilium etcd cluster CR: the server could not find the requested resource #25

Closed joestringer closed 5 years ago

joestringer commented 5 years ago

Deployed against a PR close to master version of Cilium today (specifically https://github.com/cilium/cilium/pull/6357, which can sometimes affect connectivity), and cilium-etcd-operator could not create the etcd cluster CR then got "stuck", ie even after more than 10 minutes of inactivity, there are no subsequent logs but kubernetes reports the pod as healthy:

$ kubectl -n kube-system logs cilium-etcd-operator-66794d6669-jksz7
time="2019-01-10T00:45:41Z" level=info msg="Waiting for k8s api-server to be ready..."
time="2019-01-10T00:45:41Z" level=info msg="Connected to k8s api-server" ipaddress="https://10.96.0.1:443"
2019/01/10 00:45:41 [INFO] generate received request
2019/01/10 00:45:41 [INFO] received CSR
2019/01/10 00:45:41 [INFO] generating key: rsa-2048
2019/01/10 00:45:41 [INFO] encoded CSR
2019/01/10 00:45:41 [INFO] signed certificate with serial number 130448182967430812328035797262991743908321227494
2019/01/10 00:45:41 [INFO] generate received request
2019/01/10 00:45:41 [INFO] received CSR
2019/01/10 00:45:41 [INFO] generating key: rsa-2048
2019/01/10 00:45:41 [INFO] encoded CSR
2019/01/10 00:45:41 [INFO] signed certificate with serial number 280088376068032567407183297576814027642363022970
2019/01/10 00:45:41 [INFO] generate received request
2019/01/10 00:45:41 [INFO] received CSR
2019/01/10 00:45:41 [INFO] generating key: rsa-2048
2019/01/10 00:45:42 [INFO] encoded CSR
2019/01/10 00:45:42 [INFO] signed certificate with serial number 164169678527332333763695725704332005646526107672
2019/01/10 00:45:42 [INFO] generate received request
2019/01/10 00:45:42 [INFO] received CSR
2019/01/10 00:45:42 [INFO] generating key: rsa-2048
2019/01/10 00:45:42 [INFO] encoded CSR
2019/01/10 00:45:42 [INFO] signed certificate with serial number 220426772935786852766629842796970532866580997508
time="2019-01-10T00:45:42Z" level=info msg="Deploying secret kube-system/cilium-etcd-server-tls..."
time="2019-01-10T00:45:42Z" level=info msg=Done
time="2019-01-10T00:45:42Z" level=info msg="Deploying secret kube-system/cilium-etcd-client-tls..."
time="2019-01-10T00:45:42Z" level=info msg=Done
time="2019-01-10T00:45:42Z" level=info msg="Deploying secret kube-system/cilium-etcd-peer-tls..."
time="2019-01-10T00:45:42Z" level=info msg=Done
time="2019-01-10T00:45:42Z" level=info msg="Deriving etcd client from cilium-etcd-client-tls to cilium-etcd-secrets..."
time="2019-01-10T00:45:42Z" level=info msg="Updating cilium-etcd-secrets secret..."
time="2019-01-10T00:45:42Z" level=info msg=Done
time="2019-01-10T00:45:42Z" level=info msg="Deploying etcd-operator CRD..."
time="2019-01-10T00:45:42Z" level=info msg="Done!"
time="2019-01-10T00:45:42Z" level=info msg="Deploying etcd-operator deployment..."
time="2019-01-10T00:45:42Z" level=info msg="Done!"
time="2019-01-10T00:45:42Z" level=info msg="Deploying Cilium etcd cluster CR..."
time="2019-01-10T00:45:42Z" level=error msg="unable to create Cilium etcd cluster CR: the server could not find the requested resource (post etcdclusters.etcd.database.coreos.com)" 
$ kubectl -n kube-system get crd etcdclusters.etcd.database.coreos.com
NAME                                    AGE
etcdclusters.etcd.database.coreos.com   14m
$ date -u
Thu Jan 10 01:00:34 UTC 2019
$ kubectl -n kube-system get pods
NAME                                    READY     STATUS    RESTARTS   AGE
cilium-etcd-operator-66794d6669-jksz7   1/1       Running   0          14m
cilium-wsv6l                            0/1       Running   6          14m
cilium-zlw5k                            0/1       Running   6          14m
etcd-k8s1                               1/1       Running   0          24m
etcd-operator-696fb6d99-49sxs           1/1       Running   0          14m
kube-apiserver-k8s1                     1/1       Running   0          22m
kube-controller-manager-k8s1            1/1       Running   1          23m
kube-dns-6f4c8dbcd4-4q74n               3/3       Running   0          23m
kube-proxy-6mjvw                        1/1       Running   0          23m
kube-proxy-765wx                        1/1       Running   0          15m
kube-scheduler-k8s1                     1/1       Running   1          23m
joestringer commented 5 years ago

Workaround:

$ kubectl -n kube-system delete etcdclusters.etcd.database.coreos.com cilium-etcd
$ kubectl -n kube-system delete po -l name=cilium-etcd-operator
$ kubectl -n kube-system delete po -l k8s-app=cilium