coreos / etcd-operator

etcd operator creates/configures/manages etcd clusters atop Kubernetes
https://coreos.com/blog/introducing-the-etcd-operator.html
Apache License 2.0
1.75k stars 741 forks source link

etcd-operator does not recovery from docker4mac restart #2084

Open ldelossa opened 5 years ago

ldelossa commented 5 years ago

I'm doing most of my local dev with K8s running on Docker4Mac. When D4M restarts the operator seems to start correctly but the cluster nodes are left in PodInitializing state.

❯ kubectl get pods
NAME                               READY     STATUS            RESTARTS   AGE
etcd-operator-688979975f-lstpm     1/1       Running           0          3d
openedge-etcd-cluster-c4rtk2lptg   0/1       PodInitializing   0          3d
openedge-etcd-cluster-ggzh9brb7n   0/1       PodInitializing   0          3d
openedge-etcd-cluster-jkr6jddzln   0/1       PodInitializing   0          3d
❯ kubectl logs openedge-etcd-cluster-c4rtk2lptg
Error from server (BadRequest): container "etcd" in pod "openedge-etcd-cluster-c4rtk2lptg" is waiting to start: PodInitializing
❯ kubectl describe pods openedge-etcd-cluster-c4rtk2lptg
Name:           openedge-etcd-cluster-c4rtk2lptg
Namespace:      default
Node:           docker-for-desktop/192.168.65.3
Start Time:     Fri, 03 May 2019 19:08:15 -0400
Labels:         app=etcd
                etcd_cluster=openedge-etcd-cluster
                etcd_node=openedge-etcd-cluster-c4rtk2lptg
Annotations:    etcd.version=3.3.12
Status:         Pending
IP:
Controlled By:  EtcdCluster/openedge-etcd-cluster
Init Containers:
  check-dns:
    Container ID:  docker://5073864ea343e2051d757ea52fdec5ea07db4876be9f7ed0f6d8ff95891edd3e
    Image:         busybox:1.28.0-glibc
    Image ID:      docker-pullable://busybox@sha256:0b55a30394294ab23b9afd58fab94e61a923f5834fba7ddbae7f8e0c11ba85e6
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
      -c

          TIMEOUT_READY=0
          while ( ! nslookup openedge-etcd-cluster-c4rtk2lptg.openedge-etcd-cluster.default.svc )
          do
            # If TIMEOUT_READY is 0 we should never time out and exit
            TIMEOUT_READY=$(( TIMEOUT_READY-1 ))
                        if [ $TIMEOUT_READY -eq 0 ];
                                  then
                                      echo "Timed out waiting for DNS entry"
                                      exit 1
                                  fi
                              sleep 1
                            done
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 03 May 2019 19:08:16 -0400
      Finished:     Fri, 03 May 2019 19:08:17 -0400
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:         <none>
Containers:
  etcd:
    Container ID:
    Image:         quay.io/coreos/etcd:v3.3.12
    Image ID:
    Ports:         2380/TCP, 2379/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      /usr/local/bin/etcd
      --data-dir=/var/etcd/data
      --name=openedge-etcd-cluster-c4rtk2lptg
      --initial-advertise-peer-urls=http://openedge-etcd-cluster-c4rtk2lptg.openedge-etcd-cluster.default.svc:2380
      --listen-peer-urls=http://0.0.0.0:2380
      --listen-client-urls=http://0.0.0.0:2379
      --advertise-client-urls=http://openedge-etcd-cluster-c4rtk2lptg.openedge-etcd-cluster.default.svc:2379
      --initial-cluster=openedge-etcd-cluster-jkr6jddzln=http://openedge-etcd-cluster-jkr6jddzln.openedge-etcd-cluster.default.svc:2380,openedge-etcd-cluster-ggzh9brb7n=http://openedge-etcd-cluster-ggzh9brb7n.openedge-etcd-cluster.default.svc:2380,openedge-etcd-cluster-c4rtk2lptg=http://openedge-etcd-cluster-c4rtk2lptg.openedge-etcd-cluster.default.svc:2380
      --initial-cluster-state=existing
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Liveness:       exec [/bin/sh -ec ETCDCTL_API=3 etcdctl endpoint status] delay=10s timeout=10s period=60s #success=1 #failure=3
    Readiness:      exec [/bin/sh -ec ETCDCTL_API=3 etcdctl endpoint status] delay=1s timeout=5s period=5s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /var/etcd from etcd-data (rw)
Conditions:
  Type           Status
  Initialized    True
  Ready          False
  PodScheduled   True
Volumes:
  etcd-data:
    Type:        EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason                 Age   From                         Message
  ----    ------                 ----  ----                         -------
  Normal  SuccessfulMountVolume  58m   kubelet, docker-for-desktop  MountVolume.SetUp succeeded for volume "etcd-data"
  Normal  Pulled                 58m   kubelet, docker-for-desktop  Container image "busybox:1.28.0-glibc" already present on machine
  Normal  Created                58m   kubelet, docker-for-desktop  Created container
  Normal  Started                58m   kubelet, docker-for-desktop  Started container