[bitnami/etcd] Node unable to start on rollout restart

mboutet commented 4 years ago

Which chart: etcd-4.8.14

Describe the bug Issuing kubectl rollout restart on a 3-nodes etcd statefulset results in the last node going into crash loopback

To Reproduce Steps to reproduce the behavior:

Deploy with the following values (adjust storage classes accordingly)

## etcd
##
etcd:
priorityClassName: high-priority
nodeSelector:
mode: system
statefulset:
replicaCount: 3
pdb:
enabled: true
minAvailable: 2
affinity: |
podAntiAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:
    - topologyKey: "kubernetes.io/hostname"
      labelSelector:
        matchLabels: {{- include "etcd.etcd.matchLabels" . | nindent 10 }}
auth:
rbac:
  enabled: true
envVarsConfigMap: etcd-extra-env
persistence:
enabled: true
storageClass: azure-disk-premium-retain-and-wait-for-first-consumer
size: 8Gi
resources:
requests:
  memory: 256Mi
  cpu: 200m
limits:
  memory: 256Mi
  cpu: 200m
disasterRecovery:
enabled: true
cronjob:
  snapshotHistoryLimit: 3
  historyLimit: 3
pvc:
  storageClassName: azure-file-standard-zrs-retain-and-wait-for-first-consumer
  size: 32Gi
metrics:
enabled: true
serviceMonitor:
  enabled: true

Upgrade the release with this value etcd.initialClusterState=existing.
Let the deployment run for some time in order to have some snapshots. I don't know if the snapshots are related to the problem, but in my case, the deployment was running for approximately 3h before issuing the rollout restart.
Issue a kubectl rollout restart on the statefulset

See error (debug mode enabled):

Failed to load logs:
Reason:  ()==> Bash debug is on
==> Detected data from previous deployments...
asxsi0eruz-etcd-0.asxsi0eruz-etcd-headless.prod.svc.cluster.local:2380 is healthy: successfully committed proposal: took = 187.534597ms
{"level":"warn","ts":"2020-07-22T17:35:42.909Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-8250a892-fe1c-4c1c-89a9-fd236ce9ff07/asxsi0eruz-etcd-2.asxsi0eruz-etcd-headless.prod.svc.cluster.local:2380","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.244.3.41:2380: connect: connection refused\""}
asxsi0eruz-etcd-2.asxsi0eruz-etcd-headless.prod.svc.cluster.local:2380 is unhealthy: failed to commit proposal: context deadline exceeded
Error: unhealthy cluster
grep: /bitnami/etcd/member_removal.log: No such file or directory
==> Updating member in existing cluster...
Error: bad member ID arg (strconv.ParseUint: parsing "": invalid syntax), expecting ID in Hex
{"level":"warn","ts":"2020-07-22T17:35:42.909Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-8250a892-fe1c-4c1c-89a9-fd236ce9ff07/asxsi0eruz-etcd-2.asxsi0eruz-etcd-headless.prod.svc.cluster.local:2380","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.244.3.41:2380: connect: connection refused\""}
asxsi0eruz-etcd-2.asxsi0eruz-etcd-headless.prod.svc.cluster.local:2380 is unhealthy: failed to commit proposal: context deadline exceeded
Error: unhealthy cluster
grep: /bitnami/etcd/member_removal.log: No such file or directory
==> Updating member in existing cluster...
Error: bad member ID arg (strconv.ParseUint: parsing "": invalid syntax), expecting ID in Hex

Expected behavior Each etcd node should smoothly restart.

Version of Helm and Kubernetes:

Output of helm version:

version.BuildInfo{Version:"v3.2.4", GitCommit:"0ad800ef43d3b826f31a5ad8dfbb4fe05d143688", GitTreeState:"dirty", GoVersion:"go1.14.3"}

Output of kubectl version:

Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.5", GitCommit:"e6503f8d8f769ace2f338794c914a96fc335df0f", GitTreeState:"clean", BuildDate:"2020-06-27T00:38:11Z", GoVersion:"go1.14.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"c02cd11cc8bced1391937fe271c3b9c9fe9befa0", GitTreeState:"clean", BuildDate:"2020-06-24T19:57:20Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}

Additional context I experienced the bug the other day and created an issue (that I closed since I was not able to reproduce) https://github.com/bitnami/charts/issues/3158

mboutet commented 4 years ago

Scaling down the statefulset to 0 and then scaling back up to 3 returns the etcd cluster back to a healthy state, given there is a snapshot to restore from. However, we should look into this flaky behaviour. Perhaps the prestop-hook.sh is not behaving as expected?

mboutet commented 4 years ago

Here are the logs for one of the etcd node before restarting:

2020-07-22 18:43:11.635930 W | rafthttp: lost the TCP streaming connection with peer 1cf9e1c4e2097dd3 (stream MsgApp v2 reader)
raft2020/07/22 18:43:11 INFO: f576434d574a84a6 switched to configuration voters=(2087948137985048019 3186439143105623338)
2020-07-22 18:43:11.636132 W | rafthttp: lost the TCP streaming connection with peer 2c388184a735f52a (stream MsgApp v2 reader)
2020-07-22 18:43:11.636152 I | etcdserver/membership: removed member f576434d574a84a6 from cluster 28db4cf9c4be26fd
2020-07-22 18:43:11.636450 W | rafthttp: lost the TCP streaming connection with peer 2c388184a735f52a (stream Message reader)
2020-07-22 18:43:11.642567 E | rafthttp: failed to dial 2c388184a735f52a on stream MsgApp v2 (the member has been permanently removed from the cluster)
2020-07-22 18:43:11.642583 I | rafthttp: peer 2c388184a735f52a became inactive (message send to peer failed)
2020-07-22 18:43:11.642600 E | etcdserver: the member has been permanently removed from the cluster
2020-07-22 18:43:11.642606 I | etcdserver: the data-dir used by this member must be removed.
2020-07-22 18:43:11.642650 I | rafthttp: stopped HTTP pipelining with peer 2c388184a735f52a
2020-07-22 18:43:11.642748 I | rafthttp: stopped HTTP pipelining with peer 1cf9e1c4e2097dd3
2020-07-22 18:43:11.642755 I | rafthttp: stopping peer 1cf9e1c4e2097dd3...
2020-07-22 18:43:11.643134 I | rafthttp: closed the TCP streaming connection with peer 1cf9e1c4e2097dd3 (stream MsgApp v2 writer)
2020-07-22 18:43:11.643146 I | rafthttp: stopped streaming with peer 1cf9e1c4e2097dd3 (writer)
2020-07-22 18:43:11.643609 I | rafthttp: closed the TCP streaming connection with peer 1cf9e1c4e2097dd3 (stream Message writer)
2020-07-22 18:43:11.643621 I | rafthttp: stopped streaming with peer 1cf9e1c4e2097dd3 (writer)
2020-07-22 18:43:11.643736 I | rafthttp: stopped HTTP pipelining with peer 1cf9e1c4e2097dd3
2020-07-22 18:43:11.643796 E | rafthttp: failed to dial 1cf9e1c4e2097dd3 on stream MsgApp v2 (context canceled)
2020-07-22 18:43:11.643804 I | rafthttp: peer 1cf9e1c4e2097dd3 became inactive (message send to peer failed)
2020-07-22 18:43:11.643816 I | rafthttp: stopped streaming with peer 1cf9e1c4e2097dd3 (stream MsgApp v2 reader)
2020-07-22 18:43:11.643874 W | rafthttp: lost the TCP streaming connection with peer 1cf9e1c4e2097dd3 (stream Message reader)
2020-07-22 18:43:11.643887 I | rafthttp: stopped streaming with peer 1cf9e1c4e2097dd3 (stream Message reader)
2020-07-22 18:43:11.643895 I | rafthttp: stopped peer 1cf9e1c4e2097dd3
2020-07-22 18:43:11.643901 I | rafthttp: stopping peer 2c388184a735f52a...
2020-07-22 18:43:11.644686 I | rafthttp: closed the TCP streaming connection with peer 2c388184a735f52a (stream MsgApp v2 writer)
2020-07-22 18:43:11.644696 I | rafthttp: stopped streaming with peer 2c388184a735f52a (writer)
2020-07-22 18:43:11.645004 I | rafthttp: closed the TCP streaming connection with peer 2c388184a735f52a (stream Message writer)
2020-07-22 18:43:11.645011 I | rafthttp: stopped streaming with peer 2c388184a735f52a (writer)
2020-07-22 18:43:11.645248 I | rafthttp: stopped HTTP pipelining with peer 2c388184a735f52a
2020-07-22 18:43:11.645262 I | rafthttp: stopped streaming with peer 2c388184a735f52a (stream MsgApp v2 reader)
2020-07-22 18:43:11.645319 I | rafthttp: stopped streaming with peer 2c388184a735f52a (stream Message reader)
2020-07-22 18:43:11.645329 I | rafthttp: stopped peer 2c388184a735f52a

mboutet commented 4 years ago

rgarcia89 commented 4 years ago

Broken since this merge https://github.com/bitnami/charts/commit/f234cc7ff1c59950de819151b2da221d72ec24b6#diff-13f2888821f2166e69fef5282a0b4d81

andresbono commented 4 years ago

Thanks, @mboutet, @rgarcia89. We will try to reproduce it and look into it.

juan131 commented 4 years ago

Hi @mboutet @rgarcia89

I was responsible of these changes. They were meant to address this issue since the snapshotter wasn't working properly when there was only one replica. However, I don't see how these changes could break this...

This error suggests that the "$ETCD_DATA_DIR/member_id" file was not correctly created:

Error: bad member ID arg (strconv.ParseUint: parsing "": invalid syntax), expecting ID in Hex

See https://github.com/bitnami/charts/blob/master/bitnami/etcd/templates/scripts-configmap.yaml#L180

That file is created using this function that hasn't been modified recently:

    store_member_id() {
        while ! etcdctl $AUTH_OPTIONS member list; do sleep 1; done
        etcdctl $AUTH_OPTIONS member list | grep -w "$HOSTNAME" | awk '{ print $1}' | awk -F "," '{ print $1}' > "$ETCD_DATA_DIR/member_id"
        echo "==> Stored member id: $(cat ${ETCD_DATA_DIR}/member_id)" 1>&3 2>&4
        exit 0
    }

I need to continue looking into this

dk-do commented 4 years ago

Acknowledged with bitnami/etcd:3.4.9-debian-10-r45 on K8S 1.18.3

We drained a node where etcd-0 was located. K8S moved etcd-0 to another node, but it hangs in CrashLoopBackoff

2020-07-23 07:25:52.759821 I | etcdserver/membership: added member 45a18acb10aa275e [http://etcd-2.etcd-headless.default.svc.cluster.local:2380] to cluster 9e98e654be1e22d7 from store 2020-07-23 07:25:52.759843 I | etcdserver/membership: added member 936ce633ac273d75 [http://etcd-1.etcd-headless.default.svc.cluster.local:2380] to cluster 9e98e654be1e22d7 from store 2020-07-23 07:25:52.759851 I | etcdserver/membership: added member df3b2df95cd5fd29 [http://etcd-0.etcd-headless.default.svc.cluster.local:2380] to cluster 9e98e654be1e22d7 from store 2020-07-23 07:25:52.763332 W | auth: simple token is not cryptographically signed 2020-07-23 07:25:52.777193 I | rafthttp: starting peer 45a18acb10aa275e... 2020-07-23 07:25:52.777266 I | rafthttp: started HTTP pipelining with peer 45a18acb10aa275e 2020-07-23 07:25:52.777665 I | rafthttp: started streaming with peer 45a18acb10aa275e (writer) 2020-07-23 07:25:52.777829 I | rafthttp: started streaming with peer 45a18acb10aa275e (writer) 2020-07-23 07:25:52.780638 I | rafthttp: started streaming with peer 45a18acb10aa275e (stream MsgApp v2 reader) 2020-07-23 07:25:52.780679 I | rafthttp: started streaming with peer 45a18acb10aa275e (stream Message reader) 2020-07-23 07:25:52.781007 I | rafthttp: started peer 45a18acb10aa275e 2020-07-23 07:25:52.781054 I | rafthttp: added peer 45a18acb10aa275e 2020-07-23 07:25:52.781068 I | rafthttp: starting peer 936ce633ac273d75... 2020-07-23 07:25:52.781670 I | rafthttp: started HTTP pipelining with peer 936ce633ac273d75 2020-07-23 07:25:52.782023 I | rafthttp: started streaming with peer 936ce633ac273d75 (writer) 2020-07-23 07:25:52.782158 I | rafthttp: started streaming with peer 936ce633ac273d75 (writer) 2020-07-23 07:25:52.783324 I | rafthttp: started peer 936ce633ac273d75 2020-07-23 07:25:52.783520 I | rafthttp: added peer 936ce633ac273d75 2020-07-23 07:25:52.783564 I | etcdserver: starting server... [version: 3.4.9, cluster version: to_be_decided] 2020-07-23 07:25:52.783687 I | rafthttp: started streaming with peer 936ce633ac273d75 (stream Message reader) 2020-07-23 07:25:52.783982 I | rafthttp: started streaming with peer 936ce633ac273d75 (stream MsgApp v2 reader) 2020-07-23 07:25:52.785088 E | etcdserver: the member has been permanently removed from the cluster 2020-07-23 07:25:52.785152 I | etcdserver: the data-dir used by this member must be removed. 2020-07-23 07:25:52.785233 I | etcdserver: aborting publish because server is stopped 2020-07-23 07:25:52.785264 I | rafthttp: stopping peer 45a18acb10aa275e... 2020-07-23 07:25:52.785281 I | rafthttp: stopped streaming with peer 45a18acb10aa275e (writer) 2020-07-23 07:25:52.785290 I | rafthttp: stopped streaming with peer 45a18acb10aa275e (writer) 2020-07-23 07:25:52.785349 I | rafthttp: stopped HTTP pipelining with peer 45a18acb10aa275e 2020-07-23 07:25:52.785380 I | rafthttp: stopped streaming with peer 45a18acb10aa275e (stream MsgApp v2 reader) 2020-07-23 07:25:52.785400 I | rafthttp: stopped streaming with peer 45a18acb10aa275e (stream Message reader) 2020-07-23 07:25:52.785405 I | rafthttp: stopped peer 45a18acb10aa275e 2020-07-23 07:25:52.785410 I | rafthttp: stopping peer 936ce633ac273d75... 2020-07-23 07:25:52.785424 I | rafthttp: stopped streaming with peer 936ce633ac273d75 (writer) 2020-07-23 07:25:52.785435 I | rafthttp: stopped streaming with peer 936ce633ac273d75 (writer) 2020-07-23 07:25:52.785461 I | rafthttp: stopped HTTP pipelining with peer 936ce633ac273d75 2020-07-23 07:25:52.785501 I | rafthttp: stopped streaming with peer 936ce633ac273d75 (stream MsgApp v2 reader) 2020-07-23 07:25:52.785550 I | rafthttp: stopped streaming with peer 936ce633ac273d75 (stream Message reader) 2020-07-23 07:25:52.785565 I | rafthttp: stopped peer 936ce633ac273d75 2020-07-23 07:25:52.790550 I | embed: listening for peers on [::]:2380 2020-07-23 07:25:52.790843 E | rafthttp: failed to find member 45a18acb10aa275e in cluster 9e98e654be1e22d7 2020-07-23 07:25:52.790945 E | rafthttp: failed to find member 45a18acb10aa275e in cluster 9e98e654be1e22d7

sfesfizh commented 4 years ago

Acknowledged the same problem but just manually. I used 3 replicas and if delete one pod it'll come up but if you delete two then these two will start failing in a loop and never come up. image used: bitnami/etcd:3.4.9-debian-10-r52, bitnami/etcd:3.4.9 chart: 4.8.12

Alexc0007 commented 4 years ago

Hi, i suffer from the same issue...

rgarcia89 commented 4 years ago

You can go back and use --version 4.8.10 to get it running again

Alexc0007 commented 4 years ago

i meant im suffering from the same issue: whenever a pod dies - it cant re-join the cluster.

rgarcia89 commented 4 years ago

I know but have you rolled back to the old helm chart version and set the clusterstate variable to existing? I added this a few merge requests ago. With that fix it works. At least as long as you stay below helm chart version 4.8.11

Alexc0007 commented 4 years ago

i just tried that, the effect on the rollout looks like this: the first node goes down, being replaced and joins the cluster well then the second node goes down - and cant rejoin the cluster.... the rollout doesnt reach the third node...

rgarcia89 commented 4 years ago

can you show me your helm install command as well as a kubectl describe of the not starting etcd pods?

Alexc0007 commented 4 years ago

the upgrade command:

helm upgrade etcd-jenkins bitnami/etcd -f values-production.yaml --set etcd.initialClusterState=existing --version 4.8.10

describing the failed pod:

Name:         etcd-jenkins-1
Namespace:    default
Priority:     0
Node:         ip-IP.ec2.internal/IP
Start Time:   Thu, 23 Jul 2020 11:33:17 +0000
Labels:       app.kubernetes.io/instance=etcd-jenkins
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=etcd
              controller-revision-hash=etcd-jenkins-b9478bbd9
              helm.sh/chart=etcd-4.8.10
              statefulset.kubernetes.io/pod-name=etcd-jenkins-1
Annotations:  kubernetes.io/psp: eks.privileged
              prometheus.io/port: 2379
              prometheus.io/scrape: true
Status:       Running
IP:           IP
IPs:
  IP:           IP
Controlled By:  StatefulSet/etcd-jenkins
Containers:
  etcd:
    Container ID:  docker://a4a0a111ea9a11abbbcaedc5ec4179782415ba355d97c93c928205c24c385003
    Image:         docker.io/bitnami/etcd:3.4.9-debian-10-r46
    Image ID:      docker-pullable://bitnami/etcd@sha256:4369300e9c2f55312bf059a44235c00f86c121fd3c6f9f33ee5cfdfd773ea76d
    Ports:         2379/TCP, 2380/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      /scripts/setup.sh
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 23 Jul 2020 11:39:54 +0000
      Finished:     Thu, 23 Jul 2020 11:40:01 +0000
    Ready:          False
    Restart Count:  6
    Liveness:       exec [/scripts/probes.sh] delay=60s timeout=5s period=30s #success=1 #failure=5
    Readiness:      exec [/scripts/probes.sh] delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:
      BITNAMI_DEBUG:                     true
      MY_POD_IP:                          (v1:status.podIP)
      MY_POD_NAME:                       etcd-jenkins-1 (v1:metadata.name)
      ETCDCTL_API:                       3
      ETCD_NAME:                         $(MY_POD_NAME)
      ETCD_DATA_DIR:                     /bitnami/etcd/data
      ETCD_ADVERTISE_CLIENT_URLS:        http://$(MY_POD_NAME).etcd-jenkins-headless.default.svc.cluster.local:2379
      ETCD_LISTEN_CLIENT_URLS:           http://0.0.0.0:2379
      ETCD_INITIAL_ADVERTISE_PEER_URLS:  https://$(MY_POD_NAME).etcd-jenkins-headless.default.svc.cluster.local:2380
      ETCD_LISTEN_PEER_URLS:             https://0.0.0.0:2380
      ETCD_INITIAL_CLUSTER_TOKEN:        etcd-cluster-k8s
      ETCD_INITIAL_CLUSTER_STATE:        existing
      ETCD_INITIAL_CLUSTER:              etcd-jenkins-0=https://etcd-jenkins-0.etcd-jenkins-headless.default.svc.cluster.local:2380,etcd-jenkins-1=https://etcd-jenkins-1.etcd-jenkins-headless.default.svc.cluster.local:2380,etcd-jenkins-2=https://etcd-jenkins-2.etcd-jenkins-headless.default.svc.cluster.local:2380,
      ALLOW_NONE_AUTHENTICATION:         yes
      ETCD_PEER_AUTO_TLS:                true
    Mounts:
      /bitnami/etcd from data (rw)
      /scripts/prestop-hook.sh from scripts (rw,path="prestop-hook.sh")
      /scripts/probes.sh from scripts (rw,path="probes.sh")
      /scripts/setup.sh from scripts (rw,path="setup.sh")
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-qbjnq (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-etcd-jenkins-1
    ReadOnly:   false
  scripts:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      etcd-jenkins-scripts
    Optional:  false
  default-token-qbjnq:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-qbjnq
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                  Age                    From                                    Message
  ----     ------                  ----                   ----                                    -------
  Normal   Scheduled               <unknown>              default-scheduler                       Successfully assigned default/etcd-jenkins-1 to ip-IP.ec2.internal
  Normal   SuccessfulAttachVolume  9m15s                  attachdetach-controller                 AttachVolume.Attach succeeded for volume "pvc-3bcaa7a1-9bbe-48ed-bfdf-75775c1ee7b2"
  Normal   Pulled                  7m18s (x5 over 9m13s)  kubelet, ip-IP.ec2.internal  Container image "docker.io/bitnami/etcd:3.4.9-debian-10-r46" already present on machine
  Normal   Created                 7m18s (x5 over 9m13s)  kubelet, ip-IP-ec2.internal  Created container etcd
  Normal   Started                 7m18s (x5 over 9m13s)  kubelet, ip-IP.ec2.internal  Started container etcd
  Warning  BackOff                 4m8s (x21 over 8m55s)  kubelet, ip-IP.ec2.internal  Back-off restarting failed container

dk-do commented 4 years ago

Same Issue here with 4.8.10:

Helm List: etcd 1 Thu Jul 23 11:46:05 2020 DEPLOYED etcd-4.8.10 3.4.9 default

Install command: helm install --name etcd bitnami/etcd -f etcdvalues.yaml --version 4.8.10

Pod description:


Namespace:    default
Priority:     0
Node:         perftest-w6/10.83.19.18
Start Time:   Thu, 23 Jul 2020 11:48:12 +0000
Labels:       app.kubernetes.io/instance=etcd
              app.kubernetes.io/managed-by=Tiller
              app.kubernetes.io/name=etcd
              controller-revision-hash=etcd-85f5c67bf
              helm.sh/chart=etcd-4.8.10
              statefulset.kubernetes.io/pod-name=etcd-0
Annotations:  prometheus.io/port: 2379
              prometheus.io/scrape: true
Status:       Running
IP:           10.244.8.215
IPs:
  IP:           10.244.8.215
Controlled By:  StatefulSet/etcd
Containers:
  etcd:
    Container ID:  docker://e581a007d63b7021d35c24ce39fc234fbfe2102ffe41308d667bea04ce32280a
    Image:         docker.io/bitnami/etcd:3.4.9-debian-10-r46
    Image ID:      docker-pullable://bitnami/etcd@sha256:4369300e9c2f55312bf059a44235c00f86c121fd3c6f9f33ee5cfdfd773ea76d
    Ports:         2379/TCP, 2380/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      /scripts/setup.sh
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 23 Jul 2020 11:49:16 +0000
      Finished:     Thu, 23 Jul 2020 11:49:16 +0000
    Ready:          False
    Restart Count:  3
    Liveness:       exec [/scripts/probes.sh] delay=60s timeout=5s period=30s #success=1 #failure=5
    Readiness:      exec [/scripts/probes.sh] delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:
      BITNAMI_DEBUG:                     false
      MY_POD_IP:                          (v1:status.podIP)
      MY_POD_NAME:                       etcd-0 (v1:metadata.name)
      ETCDCTL_API:                       3
      ETCD_NAME:                         $(MY_POD_NAME)
      ETCD_DATA_DIR:                     /bitnami/etcd/data
      ETCD_ADVERTISE_CLIENT_URLS:        http://$(MY_POD_NAME).etcd-headless.default.svc.cluster.local:2379
      ETCD_LISTEN_CLIENT_URLS:           http://0.0.0.0:2379
      ETCD_INITIAL_ADVERTISE_PEER_URLS:  http://$(MY_POD_NAME).etcd-headless.default.svc.cluster.local:2380
      ETCD_LISTEN_PEER_URLS:             http://0.0.0.0:2380
      ETCD_INITIAL_CLUSTER_TOKEN:        etcd-cluster-k8s
      ETCD_INITIAL_CLUSTER_STATE:        new
      ETCD_INITIAL_CLUSTER:              etcd-0=http://etcd-0.etcd-headless.default.svc.cluster.local:2380,etcd-1=http://etcd-1.etcd-headless.default.svc.cluster.local:2380,etcd-2=http://etcd-2.etcd-headless.default.svc.cluster.local:2380,
      ALLOW_NONE_AUTHENTICATION:         yes
    Mounts:
      /bitnami/etcd from data (rw)
      /init-snapshot from init-snapshot-volume (rw)
      /scripts/prestop-hook.sh from scripts (rw,path="prestop-hook.sh")
      /scripts/probes.sh from scripts (rw,path="probes.sh")
      /scripts/setup.sh from scripts (rw,path="setup.sh")
      /snapshots from snapshot-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-pdqcx (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-etcd-0
    ReadOnly:   false
  scripts:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      etcd-scripts
    Optional:  false
  init-snapshot-volume:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  snapshots
    ReadOnly:   false
  snapshot-volume:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  etcd-snapshotter
    ReadOnly:   false
  default-token-pdqcx:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-pdqcx
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                  Age                From                     Message
  ----     ------                  ----               ----                     -------
  Normal   Scheduled               <unknown>          default-scheduler        Successfully assigned default/etcd-0 to perftest-w6
  Normal   SuccessfulAttachVolume  79s                attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-47558fe6-7104-4b0d-a7d1-6e0a9218dc78"
  Normal   SuccessfulAttachVolume  79s                attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-26bdb3aa-fde0-4d61-9359-163fab6e5cf8"
  Normal   Pulled                  17s (x4 over 70s)  kubelet, perftest-w6     Container image "docker.io/bitnami/etcd:3.4.9-debian-10-r46" already present on machine
  Normal   Created                 17s (x4 over 70s)  kubelet, perftest-w6     Created container etcd
  Normal   Started                 16s (x4 over 69s)  kubelet, perftest-w6     Started container etcd
  Warning  BackOff                 1s (x8 over 68s)   kubelet, perftest-w6     Back-off restarting failed container```

Pod Log:
```2020-07-23 11:48:44.317120 I | rafthttp: stopped streaming with peer 45a18acb10aa275e (writer)
2020-07-23 11:48:44.317317 I | rafthttp: stopped streaming with peer 45a18acb10aa275e (writer)
2020-07-23 11:48:44.317364 I | rafthttp: stopped HTTP pipelining with peer 45a18acb10aa275e
2020-07-23 11:48:44.317489 I | rafthttp: stopped streaming with peer 45a18acb10aa275e (stream MsgApp v2 reader)
2020-07-23 11:48:44.317501 I | rafthttp: stopped streaming with peer 45a18acb10aa275e (stream Message reader)
2020-07-23 11:48:44.317505 I | rafthttp: stopped peer 45a18acb10aa275e
2020-07-23 11:48:44.317509 I | rafthttp: stopping peer 936ce633ac273d75...
2020-07-23 11:48:44.317515 I | rafthttp: stopped streaming with peer 936ce633ac273d75 (writer)
2020-07-23 11:48:44.317521 I | rafthttp: stopped streaming with peer 936ce633ac273d75 (writer)
2020-07-23 11:48:44.317534 I | rafthttp: stopped HTTP pipelining with peer 936ce633ac273d75
2020-07-23 11:48:44.317544 I | rafthttp: stopped streaming with peer 936ce633ac273d75 (stream MsgApp v2 reader)
2020-07-23 11:48:44.317551 I | rafthttp: stopped streaming with peer 936ce633ac273d75 (stream Message reader)
2020-07-23 11:48:44.317555 I | rafthttp: stopped peer 936ce633ac273d75```

rgarcia89 commented 4 years ago

@dk-do if your cluster is not new, the cluster state variable should be existing @Alexc0007 I dont see any issue with your config :-/

Alexc0007 commented 4 years ago

i also dont see any issues with my config, however i have exactly the same issue as @dk-do i also tried older versions, and nothing works... (i tried 4.8.9 and 4.8.7)

sfesfizh commented 4 years ago

btw, I've implemented a more efficient way for changing the cluster state variable on upgrade:

{{- if .Release.IsInstall }}
            - name: ETCD_INITIAL_CLUSTER_STATE
              value: new
{{- else }}
            - name: ETCD_INITIAL_CLUSTER_STATE
              value: existing
{{- end }}

In this case, you don't need to specify it in values.

mboutet commented 4 years ago

@sfesfizh

btw, I've implemented a more efficient way for changing the cluster state variable on upgrade:
{{- if .Release.IsInstall }}
            - name: ETCD_INITIAL_CLUSTER_STATE
              value: new
{{- else }}
            - name: ETCD_INITIAL_CLUSTER_STATE
              value: existing
{{- end }}
In this case, you don't need to specify it in values.

I don't know if we should open a PR/new issue for that (since it's a little off-topic to the problem here). Anyway, I was also thinking that this would be a better way to handle the initial cluster state. However, I wonder if it would be more robust to handle the ETCD_INITIAL_CLUSTER_STATE logic in the entrypoint in case one or more etcd nodes restart before a first upgrade is performed. Otherwise, the restarted node will think that the cluster is new whereas it is in fact existing.

dk-do commented 4 years ago

@rgarcia89

What did I do? Currently we are investigating a desaster recovery / worst case scenario.

So we deleted the old deployment: helm delete etcd --purge

And created a new cluster with these settings in values.yaml:

startFromSnapshot:
  enabled: true
  ## Existingn PVC containing the etcd snapshot
  ##
  existingClaim: snapshots
  ## Snapshot filename
  ##
  snapshotFilename: db

Three pods came up successfully and all components worked fine with etcd. Then, to check if this issue is solved in Chart version 4.8.10, we deleted pod etcd-0. It came up again but was in CrashLoopBackOff and never started. We have just these log entries:

raft2020/07/23 12:50:34 INFO: df3b2df95cd5fd29 switched to configuration voters=(5017444064630024030 10623118730666130805 16085501043111492905)
2020-07-23 12:50:34.291575 I | etcdserver/membership: added member df3b2df95cd5fd29 [http://etcd-0.etcd-headless.default.svc.cluster.local:2380] to cluster 9e98e654be1e22d7
raft2020/07/23 12:50:34 INFO: raft.node: df3b2df95cd5fd29 elected leader 45a18acb10aa275e at term 23
2020-07-23 12:50:34.293342 E | etcdserver: the member has been permanently removed from the cluster
2020-07-23 12:50:34.293355 I | etcdserver: the data-dir used by this member must be removed.
2020-07-23 12:50:34.293392 E | etcdserver: publish error: etcdserver: request cancelled
2020-07-23 12:50:34.293410 E | etcdserver: publish error: etcdserver: request cancelled
2020-07-23 12:50:34.293420 I | etcdserver: aborting publish because server is stopped
2020-07-23 12:50:34.293479 I | rafthttp: stopping peer 45a18acb10aa275e...
2020-07-23 12:50:34.293501 I | rafthttp: stopped streaming with peer 45a18acb10aa275e (writer)
2020-07-23 12:50:34.293514 I | rafthttp: stopped streaming with peer 45a18acb10aa275e (writer)
2020-07-23 12:50:34.293981 I | rafthttp: stopped HTTP pipelining with peer 45a18acb10aa275e
2020-07-23 12:50:34.294008 I | rafthttp: stopped streaming with peer 45a18acb10aa275e (stream MsgApp v2 reader)
2020-07-23 12:50:34.294018 I | rafthttp: stopped streaming with peer 45a18acb10aa275e (stream Message reader)
2020-07-23 12:50:34.294023 I | rafthttp: stopped peer 45a18acb10aa275e
2020-07-23 12:50:34.294027 I | rafthttp: stopping peer 936ce633ac273d75...
2020-07-23 12:50:34.294038 I | rafthttp: stopped streaming with peer 936ce633ac273d75 (writer)
2020-07-23 12:50:34.294046 I | rafthttp: stopped streaming with peer 936ce633ac273d75 (writer)
2020-07-23 12:50:34.294056 I | rafthttp: stopped HTTP pipelining with peer 936ce633ac273d75
2020-07-23 12:50:34.294072 I | rafthttp: stopped streaming with peer 936ce633ac273d75 (stream MsgApp v2 reader)
2020-07-23 12:50:34.294085 I | rafthttp: stopped streaming with peer 936ce633ac273d75 (stream Message reader)
2020-07-23 12:50:34.294089 I | rafthttp: stopped peer 936ce633ac273d75
2020-07-23 12:50:34.303961 W | rafthttp: failed to process raft message (raft: stopped)

dk-do commented 4 years ago

Update: after scaling the statefulset down and up again, everything came up again without errors. But the pod has still this setting:

ETCD_INITIAL_CLUSTER_STATE: new

When does it change to EXISTING?

sfesfizh commented 4 years ago

When does it change to EXISTING?

It must be changed manually in values or --set configured before upgrade

I've provided a small workaround above how to avoid manually changing.

rgarcia89 commented 4 years ago

@dk-do it is something that you can just place in your values file. Like @sfesfizh said after first deployment of the etcd cluster you have to set it to existing. After that further deployments / updates should work without any issues. At least for me it is like that

Alexc0007 commented 4 years ago

there is one more important thing to take into consideration... i am using persistent volumes, which means my etcd data is on a an "external" disk. when the pods go up, they read this data and according to it try to define if its a new or an existing cluster.

if i delete those disks and re-deploy etcd, its creating a new cluster without any issues(ofcourse cluster state variable is set to "new") but by using existing disks that already contain etcd data, scaling down and up is impossible...

sfesfizh commented 4 years ago

there is one more important thing to take into consideration... i am using persistent volumes, which means my etcd data is on a an "external" disk. when the pods go up, they read this data and according to it try to define if its a new or an existing cluster.

if i delete those disks and re-deploy etcd, its creating a new cluster without any issues(ofcourse cluster state variable is set to "new") but by using existing disks that already contain etcd data, scaling down and up is impossible...

+1, same issue for me.

juan131 commented 4 years ago

Hi everyone,

I found an issue due to the env vars from the existing cluster not being properly when restarting a container. That was preventing the pod to join the cluster (even when disaster recovery is not necessary). I just created a PR to address it.

Please feel free to give the solution a try

Alexc0007 commented 4 years ago

Hi @juan131 , ill gladly test it out... but i guess there is no chart version with the current changes

Alexc0007 commented 4 years ago

@juan131 , i applied your changes manually... but it doesnt seem to change anything...

juan131 commented 4 years ago

Hi @Alexc0007

You need to clone the repo and apply the changes since theres's no version published yet.

i applied your changes manually... but it doesnt seem to change anything...

Did you try the steps I mentioned in the PR's description? The pods should have been able to rejoin the cluster after being restarted.

Alexc0007 commented 4 years ago

Hi, i just looked at the commit and did the same changes in my configmap... (i didnt clone the repo) and it didnt help

rgarcia89 commented 4 years ago

@Alexc0007 I really cant follow why helm chart version 4.8.10 is not working for you. Have you just out of interest deployed a new cluster, set the env to existing cluster afterwards, applied the changes to the cluster and then tried to delete a pod, to see if it joins again?

I am on this version - and everything works for me

Alexc0007 commented 4 years ago

Hi @rgarcia89 as i explained above, i did install a new cluster(version: 4.8.10), then changed the cluster state to existing, it automatically starts a rollout. the first node is usually replaced OK, then the second node wont re-join the cluster... the rollout doesnt reach the third node...

Alexc0007 commented 4 years ago

so i guess this is closed based on a fix that only i tried and didnt work for me? has anyone else of the other members of this thread tried this fix?

juan131 commented 4 years ago

Hi @Alexc0007 @rgarcia89

Could you please give a try to the latest version we just released (4.9.1)?

$ helm repo update
...Successfully got an update from the "bitnami" chart repository
Update Complete. ⎈ Happy Helming!⎈
$ helm search repo bitnami/etcd
NAME            CHART VERSION   APP VERSION DESCRIPTION
bitnami/etcd    4.9.1           3.4.10      etcd is a distributed key value store that **prov...**

rgarcia89 commented 4 years ago

@juan131 it is not working. I just run a test. Installed with 3.4.10-debian-10-r1 afterwards tried to upgrade to version 3.4.10-debian-10-r4. Pod does not come up anymore. See the screenshot. Even setting the cluster state to existing does not fix the issue.

The pod stays in loop

also in my helm repo I am only seeing version 4.9.0

[raulgs@raulgs-xm1 etcd]$ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "bitnami" chart repository
...Successfully got an update from the "stable" chart repository
Update Complete. ⎈ Happy Helming!⎈
[raulgs@raulgs-xm1 etcd]$ helm search repo bitnami/etcd
NAME            CHART VERSION   APP VERSION     DESCRIPTION
bitnami/etcd    4.9.0           3.4.10          etcd is a distributed key value store that prov...

rgarcia89 commented 4 years ago

I have been trying around a bit. It seems to be image and helm chart version related.

If I deploy the helm chart with this command helm upgrade --install --namespace raul --wait -f values/raul.yaml --version 4.8.4 etcd bitnami/etcd using the following values file: https://pastebin.com/YA3nHcYL

and then afterwards for example upgrade to version 3.4.9-debian-10-r54 with clusterstate existing so using this values file: https://pastebin.com/wvR6hzSu things are working fine.

However, it is not working with image 3.4.10 and also not with newer helm chart versions...

juan131 commented 4 years ago

Hi @rgarcia89

It could be related with these changes:

https://github.com/bitnami/bitnami-docker-etcd#3410-debian-10-r0

rgarcia89 commented 4 years ago

possibly I don't know what commands are run that could be failing because of this change. However, the output of the image looks like that:

[raulgs@raulgs-xm1 ~]$ klogs -f pod/etcd-2
==> Bash debug is off
==> Detected data from previous deployments...
==> Adding new member to existing cluster...

Alexc0007 commented 4 years ago

Hi Everyone, i can confirm that after switching to image: 3.4.9-debian-10-r54 with helm version 4.9.1 - everything works well i created a fresh cluster with the image above:

helm install etcd-jenkins bitnami/etcd -f values-production.yaml --set etcd.initialClusterState=new --version 4.9.1

then upgraded s follows:

helm upgrade etcd-jenkins bitnami/etcd -f values-production.yaml --set etcd.initialClusterState=existing --version 4.9.1

then a rollout started, and was completed successfully - all old pods were terminated and replaced by new ones joining the current cluster. this is good. thanks to @rgarcia89 !

juan131 commented 4 years ago

I'm glad you were able to use the latest version of the chart @Alexc0007 without issues. Did you try the same version of the chart but switching to the latest image 3.4.10-debian-10-r1?

Installing the chart from scratch I found no issues with the latest image/chart.

Alexc0007 commented 4 years ago

hi, i didnt try the latest image, ill try it later and report.

KagurazakaNyaa commented 3 years ago

chart version etcd-6.1.2 with image 3.4.15-debian-10-r14 still have same problem

values.yml changes:

L80 to enabled: false to disable rbac
other with default value

Run helm install etcd-test ./etcd

the pod etcd-test-0 is running normally

than change values.yml L201 to replicaCount: 7 and run helm upgrade etcd-test ./etcd to increase cluster size, this step successful finished and pod etcd-test-{1..6} are running normally

but, when decrease replicaCount to 5, and run helm upgrade to apply changes, the etcd-test-4 has change to CrashLoopBackOff state, other pods not updated

juan131 commented 3 years ago

Hi @KagurazakaNyaa

than change values.yml L201 to replicaCount: 7 and run helm upgrade etcd-test ./etcd to increase cluster size, this step successful finished and pod etcd-test-{1..6} are running normally

Note that wit this new major version, it's not mandatory to scale the solution using helm upgrade ..., you can use kubectl scale ... which is simpler and faster, see:

https://docs.bitnami.com/kubernetes/infrastructure/etcd/administration/add-nodes/

but, when decrease replicaCount to 5, and run helm upgrade to apply changes, the etcd-test-4 has change to CrashLoopBackOff state, other pods not updated

Could you share the logs of the etcd-test-4 pod? Also, could you try to decrease using kubectl scale ... and let us know if you find any issue in that case?, see:

https://docs.bitnami.com/kubernetes/infrastructure/etcd/administration/scale-down/

KagurazakaNyaa commented 3 years ago

Thanks for your reply @juan131

I tried to reproduce my operation, and the etcd-test-4 pod log like this:

etcd 07:22:59.66
etcd 07:22:59.66 Welcome to the Bitnami etcd container
etcd 07:22:59.66 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-etcd
etcd 07:22:59.66 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-etcd/issues
etcd 07:22:59.66
etcd 07:22:59.66 INFO  ==> ** Starting etcd setup **
etcd 07:22:59.67 INFO  ==> Validating settings in ETCD_* env vars..
etcd 07:22:59.67 WARN  ==> You set the environment variable ALLOW_NONE_AUTHENTICATION=yes. For safety reasons, do not use this flag in a production environment.
etcd 07:22:59.67 INFO  ==> Initializing etcd
etcd 07:22:59.68 INFO  ==> Detected data from previous deployments
etcd 07:23:09.75 INFO  ==> Updating member in existing cluster
Error: bad member ID arg (strconv.ParseUint: parsing "": invalid syntax), expecting ID in Hex

and I will delete this cluster and try use kubectl scale to change scale

But the initial purpose of my test is to automatically recover after several nodes of the etcd cluster are dynamically updated or deleted when the kubernetes cluster is upgraded or migrated. In the initial test, I used the command kubectl delete pod etcd-test-1 to test, and the same problem occurred, I also tried to delete the corresponding pvc and re-execute the command, but it still has no effect

juan131 commented 3 years ago

Hi @KagurazakaNyaa

etcd 07:23:09.75 INFO  ==> Updating member in existing cluster
Error: bad member ID arg (strconv.ParseUint: parsing "": invalid syntax), expecting ID in Hex

I wasn't able to reproduce the error above ⏫ . This is what I did:

Instal the chart with replicas:

$ helm install etcd bitnami/etcd --set replicaCount=3

Wait for the 3 initial replicas to finish initial cluster bootstrapping and be ready:

$ kubectl get pods -w
NAME     READY   STATUS    RESTARTS   AGE
etcd-0   0/1     Pending   0          0s
etcd-1   0/1     Pending   0          0s
etcd-2   0/1     Pending   0          0s
...
etcd-2   1/1     Running             0          75s
etcd-0   1/1     Running             0          78s
etcd-1   1/1     Running             0          84s

Create a etcd-client pod as per installations notes and access it:

$ kubectl run etcd-client --restart='Never' --image docker.io/bitnami/etcd:3.4.15-debian-10-r14 --env ROOT_PASSWORD=$(kubectl get secret --namespace default etcd -o jsonpath="{.data.etcd-root-password}" | base64 --decode) --env ETCDCTL_ENDPOINTS="etcd.default.svc.cluster.local:2379" --namespace default --command -- sleep infinity
$ kubectl exec -it etcd-client -- bash

Check cluster status:

$ etcdctl member list
45a18acb10aa275e, started, etcd-2, http://etcd-2.etcd-headless.default.svc.cluster.local:2380, http://etcd-2.etcd-headless.default.svc.cluster.local:2379, false
936ce633ac273d75, started, etcd-1, http://etcd-1.etcd-headless.default.svc.cluster.local:2380, http://etcd-1.etcd-headless.default.svc.cluster.local:2379, false
df3b2df95cd5fd29, started, etcd-0, http://etcd-0.etcd-headless.default.svc.cluster.local:2380, http://etcd-0.etcd-headless.default.svc.cluster.local:2379, false

Scale the cluster to 7 replicas. I used this small script to increase replicas one by one (it's slower but less error-prone):

current_replicas=3
desired_replicas=7
while [[ current_replicas -lt desired_replicas ]]; do
    kubectl scale --replicas=$((current_replicas + 1)) statefulset/etcd
    kubectl rollout status statefulset/etcd
    current_replicas=$((current_replicas + 1))
done

Wait for the 7 replicas to be ready and check cluster status:

$ kubectl get pods -w
NAME          READY   STATUS    RESTARTS   AGE
etcd-0        1/1     Running   0          11m
etcd-1        1/1     Running   0          11m
etcd-2        1/1     Running   0          11m
etcd-3        0/1     Pending   0          2s
...
etcd-6        0/1     Running             0          14s
etcd-6        1/1     Running             0          76s
$ kubectl exec -it etcd-client -- etcdctl member list
2bc27f0bc39445f7, started, etcd-4, http://etcd-4.etcd-headless.default.svc.cluster.local:2380, http://etcd-4.etcd-headless.default.svc.cluster.local:2379, false
37d45ca3d0f2410f, started, etcd-3, http://etcd-3.etcd-headless.default.svc.cluster.local:2380, http://etcd-3.etcd-headless.default.svc.cluster.local:2379, false
45a18acb10aa275e, started, etcd-2, http://etcd-2.etcd-headless.default.svc.cluster.local:2380, http://etcd-2.etcd-headless.default.svc.cluster.local:2379, false
936ce633ac273d75, started, etcd-1, http://etcd-1.etcd-headless.default.svc.cluster.local:2380, http://etcd-1.etcd-headless.default.svc.cluster.local:2379, false
d0dfbfbac07bfe6b, started, etcd-5, http://etcd-5.etcd-headless.default.svc.cluster.local:2380, http://etcd-5.etcd-headless.default.svc.cluster.local:2379, false
df3b2df95cd5fd29, started, etcd-0, http://etcd-0.etcd-headless.default.svc.cluster.local:2380, http://etcd-0.etcd-headless.default.svc.cluster.local:2379, false
ebbcdfb56ab0db90, started, etcd-6, http://etcd-6.etcd-headless.default.svc.cluster.local:2380, http://etcd-6.etcd-headless.default.svc.cluster.local:2379, false

Remove any random pod (e.g. etcd-2) and check with the etcd-client what happens. In theory, it should be:
- Removed from the cluster during pod eviction using the preStop hook
- Added again to the cluster during the initialization

$ kubectl delete pod etcd-2
$ kubectl exec -it etcd-client -- etcdctl member list
2bc27f0bc39445f7, started, etcd-4, http://etcd-4.etcd-headless.default.svc.cluster.local:2380, http://etcd-4.etcd-headless.default.svc.cluster.local:2379, false
37d45ca3d0f2410f, started, etcd-3, http://etcd-3.etcd-headless.default.svc.cluster.local:2380, http://etcd-3.etcd-headless.default.svc.cluster.local:2379, false
936ce633ac273d75, started, etcd-1, http://etcd-1.etcd-headless.default.svc.cluster.local:2380, http://etcd-1.etcd-headless.default.svc.cluster.local:2379, false
d0dfbfbac07bfe6b, started, etcd-5, http://etcd-5.etcd-headless.default.svc.cluster.local:2380, http://etcd-5.etcd-headless.default.svc.cluster.local:2379, false
df3b2df95cd5fd29, started, etcd-0, http://etcd-0.etcd-headless.default.svc.cluster.local:2380, http://etcd-0.etcd-headless.default.svc.cluster.local:2379, false
ebbcdfb56ab0db90, started, etcd-6, http://etcd-6.etcd-headless.default.svc.cluster.local:2380, http://etcd-6.etcd-headless.default.svc.cluster.local:2379, false
$ kubectl get pod etcd-2 -w
NAME     READY   STATUS              RESTARTS   AGE
etcd-2   0/1     ContainerCreating   0          9s
etcd-2   0/1     Running             0          10s
etcd-2   1/1     Running             0          74s
$ kubectl exec -it etcd-client -- etcdctl member list
2bc27f0bc39445f7, started, etcd-4, http://etcd-4.etcd-headless.default.svc.cluster.local:2380, http://etcd-4.etcd-headless.default.svc.cluster.local:2379, false
37d45ca3d0f2410f, started, etcd-3, http://etcd-3.etcd-headless.default.svc.cluster.local:2380, http://etcd-3.etcd-headless.default.svc.cluster.local:2379, false
38c9726f082cf87d, started, etcd-2, http://etcd-2.etcd-headless.default.svc.cluster.local:2380, http://etcd-2.etcd-headless.default.svc.cluster.local:2379, false
936ce633ac273d75, started, etcd-1, http://etcd-1.etcd-headless.default.svc.cluster.local:2380, http://etcd-1.etcd-headless.default.svc.cluster.local:2379, false
d0dfbfbac07bfe6b, started, etcd-5, http://etcd-5.etcd-headless.default.svc.cluster.local:2380, http://etcd-5.etcd-headless.default.svc.cluster.local:2379, false
df3b2df95cd5fd29, started, etcd-0, http://etcd-0.etcd-headless.default.svc.cluster.local:2380, http://etcd-0.etcd-headless.default.svc.cluster.local:2379, false
ebbcdfb56ab0db90, started, etcd-6, http://etcd-6.etcd-headless.default.svc.cluster.local:2380, http://etcd-6.etcd-headless.default.svc.cluster.local:2379, false

As we can see, the behaviour matches the expected one.
Scale to 5 replicas (using the script again):

current_replicas=7
desired_replicas=5
while [[ current_replicas -gt desired_replicas ]]; do
    kubectl scale --replicas=$((current_replicas - 1)) statefulset/etcd
    kubectl rollout status statefulset/etcd
    current_replicas=$((current_replicas - 1))
done

Wait for the replicas to be removed and check cluster status:

$ kubectl get pods -w
NAME          READY   STATUS        RESTARTS   AGE
etcd-0        1/1     Running       0          23m
etcd-1        1/1     Running       0          23m
etcd-2        1/1     Running       0          3m6s
etcd-3        1/1     Running       0          11m
etcd-4        1/1     Running       0          10m
etcd-6        0/1     Terminating   0          9m16s
...
etcd-5        0/1     Terminating   0          9m23s
$ kubectl exec -it etcd-client -- etcdctl member list
2bc27f0bc39445f7, started, etcd-4, http://etcd-4.etcd-headless.default.svc.cluster.local:2380, http://etcd-4.etcd-headless.default.svc.cluster.local:2379, false
37d45ca3d0f2410f, started, etcd-3, http://etcd-3.etcd-headless.default.svc.cluster.local:2380, http://etcd-3.etcd-headless.default.svc.cluster.local:2379, false
38c9726f082cf87d, started, etcd-2, http://etcd-2.etcd-headless.default.svc.cluster.local:2380, http://etcd-2.etcd-headless.default.svc.cluster.local:2379, false
936ce633ac273d75, started, etcd-1, http://etcd-1.etcd-headless.default.svc.cluster.local:2380, http://etcd-1.etcd-headless.default.svc.cluster.local:2379, false
df3b2df95cd5fd29, started, etcd-0, http://etcd-0.etcd-headless.default.svc.cluster.local:2380, http://etcd-0.etcd-headless.default.svc.cluster.local:2379, false

KagurazakaNyaa commented 3 years ago

Hi @juan131 , I think I found the reason. After disabling rbac, any re-created pods cannot join the cluster.

juan131 commented 3 years ago

Hi @KagurazakaNyaa

I wasn't able to reproduce that either...

I installed the chart disabling RBAC:

$ helm install etcd bitnami/etcd --set replicaCount=3 --set auth.rbac.enabled=false

Wait for the 3 initial replicas to finish initial cluster bootstrapping and be ready:

$ kubectl get pods -w
NAME     READY   STATUS    RESTARTS   AGE
etcd-0   0/1     Pending   0          0s
etcd-1   0/1     Pending   0          0s
etcd-2   0/1     Pending   0          0s
...
etcd-2   1/1     Running             0          75s
etcd-0   1/1     Running             0          78s
etcd-1   1/1     Running             0          84s

Then I deleted one of the pods:

$ kubectl delete pod etcd-1

And, as you can see below, it was able to rejoin the cluster without issues:

$ kubectl get pods -w
etcd-1        1/1     Terminating         0          2m14s
etcd-1        0/1     Terminating         0          2m20s
etcd-1        0/1     Terminating         0          2m25s
etcd-1        0/1     Terminating         0          2m25s
etcd-1        0/1     Pending             0          0s
etcd-1        0/1     Pending             0          0s
etcd-1        0/1     ContainerCreating   0          0s
etcd-1        0/1     Running             0          6s
etcd-1        1/1     Running             0          74s
$ kubectl logs etcd-1
...
etcd 14:39:18.92 INFO  ==> ** Starting etcd setup **
etcd 14:39:18.93 INFO  ==> Validating settings in ETCD_* env vars..
etcd 14:39:18.93 WARN  ==> You set the environment variable ALLOW_NONE_AUTHENTICATION=yes. For safety reasons, do not use this flag in a production environment.
etcd 14:39:18.94 INFO  ==> Initializing etcd
etcd 14:39:18.95 INFO  ==> Detected data from previous deployments
etcd 14:39:29.25 INFO  ==> Adding new member to existing cluster
etcd 14:39:34.50 INFO  ==> ** etcd setup finished! **

etcd 14:39:34.52 INFO  ==> ** Starting etcd **
...
$ kubectl exec -it etcd-client -- etcdctl member list
211c7cde9e20cbd9, started, etcd-1, http://etcd-1.etcd-headless.default.svc.cluster.local:2380, http://etcd-1.etcd-headless.default.svc.cluster.local:2379, false
45a18acb10aa275e, started, etcd-2, http://etcd-2.etcd-headless.default.svc.cluster.local:2380, http://etcd-2.etcd-headless.default.svc.cluster.local:2379, false
df3b2df95cd5fd29, started, etcd-0, http://etcd-0.etcd-headless.default.svc.cluster.local:2380, http://etcd-0.etcd-headless.default.svc.cluster.local:2379, false

abdennour commented 3 years ago

the 1st pod ( etcd-prod-0) has been evicted, but it was not returned back till now.

k logs -f etcd-prod-0
etcd 08:56:58.90
etcd 08:56:58.91 Welcome to the Bitnami etcd container
etcd 08:56:58.91 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-etcd
etcd 08:56:58.92 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-etcd/issues
etcd 08:56:58.92
etcd 08:56:58.92 INFO  ==> ** Starting etcd setup **
etcd 08:56:58.95 INFO  ==> Validating settings in ETCD_* env vars..
etcd 08:56:58.96 WARN  ==> You set the environment variable ALLOW_NONE_AUTHENTICATION=yes. For safety reasons, do not use this flag in a production environment.
etcd 08:56:58.97 INFO  ==> Initializing etcd
etcd 08:56:59.00 INFO  ==> Detected data from previous deployments
etcd 08:56:59.29 INFO  ==> Updating member in existing cluster
Error: bad member ID arg (strconv.ParseUint: parsing "": invalid syntax), expecting ID in Hex

Does this mean data corruption of the 1st member ? if so how to recover ?

Known that i enabled disasterRecovery in my helm releases :

disasterRecovery:
  enabled: true
  cronjob:
    schedule: "*/30 * * * *"
    historyLimit: 1
    ## @param disasterRecovery.cronjob.snapshotHistoryLimit Number of etcd snapshots to retain, tagged by date
    ##
    snapshotHistoryLimit: 3
    resources:
      limits:
         cpu: 500m
         memory: 1Gi

abdennour commented 3 years ago

i fixed the issue (above) by triggering a rolling update:

kubectl rollout restart statefulset/etcd-prod

bitnami / charts

[bitnami/etcd] Node unable to start on rollout restart #3190