Open nvtkaszpir opened 5 years ago
Right now etcd-operator is not updating status of etcdclusters/ in case of loosing quorum.
Steps to reproduce:
kubectl apply -f etcd-operator.deployment.yaml
# etcd-operator.deployment.yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: etcd-operator spec: replicas: 1 template: metadata: labels: name: etcd-operator spec: containers: - name: etcd-operator image: quay.io/coreos/etcd-operator:v0.9.4 command: - etcd-operator env: - name: MY_POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace - name: MY_POD_NAME valueFrom: fieldRef: fieldPath: metadata.name
kubectl apply -f etcd-cluster.crd.yaml
# etcd-cluster.crd.yaml apiVersion: "etcd.database.coreos.com/v1beta2" kind: "EtcdCluster" metadata: name: "etcd" spec: size: 3 version: "3.2.13"
wait till cluster is set up
kubectl get etcdclusters/etcd -o yaml
apiVersion: etcd.database.coreos.com/v1beta2 kind: EtcdCluster metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"etcd.database.coreos.com/v1beta2","kind":"EtcdCluster","metadata":{"annotations":{},"labels":{"etcd-operator-managed":"true"},"name":"etcd","namespace":"default"},"spec":{"size":3,"version":"3.2.13"}} creationTimestamp: "2019-03-13T23:37:49Z" generation: 1 labels: etcd-operator-managed: "true" name: etcd namespace: default resourceVersion: "2182831" selfLink: /apis/etcd.database.coreos.com/v1beta2/namespaces/default/etcdclusters/etcd uid: 037a6ceb-45e9-11e9-8b71-42010a8a000a spec: repository: quay.io/coreos/etcd size: 3 version: 3.2.13 status: clientPort: 2379 conditions: - lastTransitionTime: "2019-03-13T23:38:30Z" lastUpdateTime: "2019-03-13T23:38:30Z" reason: Cluster available status: "True" type: Available currentVersion: 3.2.13 members: ready: - etcd-6r6rpjsmtk - etcd-r5fdrln4sh - etcd-xkdcxc95vg phase: Running serviceName: etcd-client size: 3 targetVersion: ""
kubectl get pods NAME READY STATUS RESTARTS AGE etcd-6r6rpjsmtk 1/1 Running 0 43s etcd-operator-5c6bddb7f6-lxwqb 1/1 Running 0 93s etcd-r5fdrln4sh 1/1 Running 0 27s etcd-xkdcxc95vg 1/1 Running 0 51s
kill 2 pods out of 3:
kubectl delete pod/etcd-6r6rpjsmtk pod/etcd-r5fdrln4sh pod "etcd-6r6rpjsmtk" deleted pod "etcd-r5fdrln4sh" deleted
see etcd-operator log it reports that it lost quorum
stern etcd-operator ... etcd-operator-5c6bddb7f6-lxwqb etcd-operator time="2019-03-13T23:41:58Z" level=info msg="cluster membership: etcd-6r6rpjsmtk,etcd-r5fdrln4sh,etcd-xkdcxc95vg" cluster-name=etcd cluster-namespace=default pkg=cluster etcd-operator-5c6bddb7f6-lxwqb etcd-operator time="2019-03-13T23:41:58Z" level=info msg="Finish reconciling" cluster-name=etcd cluster-namespace=default pkg=cluster etcd-operator-5c6bddb7f6-lxwqb etcd-operator time="2019-03-13T23:41:58Z" level=error msg="failed to reconcile: lost quorum" cluster-name=etcd cluster-namespace=default pkg=cluster
check etcdclusters/etcd
inspect status section.
I believe there should be an info that cluster is in bad state.
this is a duplicate of #1973 but with much better description ;)
Right now etcd-operator is not updating status of etcdclusters/ in case of loosing quorum.
Steps to reproduce:
wait till cluster is set up
kill 2 pods out of 3:
see etcd-operator log it reports that it lost quorum
check etcdclusters/etcd
inspect status section.
I believe there should be an info that cluster is in bad state.