etcd Back-off restarting failed container

huangzx3 commented 2 years ago


Name:         apisix-1640071150-etcd-0
Namespace:    default
Priority:     0
Node:         izbp17djxylxntd8nigwg0z/172.16.219.233
Start Time:   Tue, 21 Dec 2021 16:06:54 +0800
Labels:       app.kubernetes.io/instance=apisix-1640071150
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=etcd
              controller-revision-hash=apisix-1640071150-etcd-5f57bcbf6
              helm.sh/chart=etcd-6.2.6
              statefulset.kubernetes.io/pod-name=apisix-1640071150-etcd-0
Annotations:  cni.projectcalico.org/podIP: 100.79.164.131/32
Status:       Running
IP:           100.79.164.131
IPs:
  IP:           100.79.164.131
Controlled By:  StatefulSet/apisix-1640071150-etcd
Containers:
  etcd:
    Container ID:   containerd://3e08a916be53b5fcff17c0b7ef41679116e3f9176cf9c14e8f0653cedd579e0f
    Image:          docker.io/bitnami/etcd:3.4.16-debian-10-r14
    Image ID:       docker.io/bitnami/etcd@sha256:ef2d499749c634588f7d281dd70cc1fb2514d57f6d42308c0fb0f2c8ca55bea4
    Ports:          2379/TCP, 2380/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 21 Dec 2021 17:19:42 +0800
      Finished:     Tue, 21 Dec 2021 17:19:42 +0800
    Ready:          False
    Restart Count:  19
    Liveness:       exec [/opt/bitnami/scripts/etcd/healthcheck.sh] delay=60s timeout=5s period=30s #success=1 #failure=5
    Readiness:      exec [/opt/bitnami/scripts/etcd/healthcheck.sh] delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:
      BITNAMI_DEBUG:                     false
      MY_POD_IP:                          (v1:status.podIP)
      MY_POD_NAME:                       apisix-1640071150-etcd-0 (v1:metadata.name)
      ETCDCTL_API:                       3
      ETCD_ON_K8S:                       yes
      ETCD_START_FROM_SNAPSHOT:          no
      ETCD_DISASTER_RECOVERY:            no
      ETCD_NAME:                         $(MY_POD_NAME)
      ETCD_DATA_DIR:                     /bitnami/etcd/data
      ETCD_LOG_LEVEL:                    info
      ALLOW_NONE_AUTHENTICATION:         yes
      ETCD_ADVERTISE_CLIENT_URLS:        http://$(MY_POD_NAME).apisix-1640071150-etcd-headless.default.svc.cluster.local:2379
      ETCD_LISTEN_CLIENT_URLS:           http://0.0.0.0:2379
      ETCD_INITIAL_ADVERTISE_PEER_URLS:  http://$(MY_POD_NAME).apisix-1640071150-etcd-headless.default.svc.cluster.local:2380
      ETCD_LISTEN_PEER_URLS:             http://0.0.0.0:2380
      ETCD_INITIAL_CLUSTER_TOKEN:        etcd-cluster-k8s
      ETCD_INITIAL_CLUSTER_STATE:        new
      ETCD_INITIAL_CLUSTER:              apisix-1640071150-etcd-0=http://apisix-1640071150-etcd-0.apisix-1640071150-etcd-headless.default.svc.cluster.local:2380,apisix-1640071150-etcd-1=http://apisix-1640071150-etcd-1.apisix-1640071150-etcd-headless.default.svc.cluster.local:2380,apisix-1640071150-etcd-2=http://apisix-1640071150-etcd-2.apisix-1640071150-etcd-headless.default.svc.cluster.local:2380
      ETCD_CLUSTER_DOMAIN:               apisix-1640071150-etcd-headless.default.svc.cluster.local
    Mounts:
      /bitnami/etcd from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-kt7g4 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-apisix-1640071150-etcd-0
    ReadOnly:   false
  default-token-kt7g4:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-kt7g4
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason   Age                    From     Message
  ----     ------   ----                   ----     -------
  Warning  BackOff  3m51s (x333 over 73m)  kubelet  Back-off restarting failed container

3 etcd pods running , but always Back-off restarting failed container . please help

huangzx3 commented 2 years ago

The previous problem has been resolved, but the next problem has appeared

2021-12-21 10:30:35.271025 I | pkg/flags: recognized and used environment variable ETCD_INITIAL_CLUSTER_STATE=existing
2021-12-21 10:30:35.271031 I | pkg/flags: recognized and used environment variable ETCD_INITIAL_CLUSTER_TOKEN=etcd-cluster-k8s
2021-12-21 10:30:35.271045 I | pkg/flags: recognized and used environment variable ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379
2021-12-21 10:30:35.271055 I | pkg/flags: recognized and used environment variable ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380
2021-12-21 10:30:35.271063 I | pkg/flags: recognized and used environment variable ETCD_LOG_LEVEL=info
2021-12-21 10:30:35.271081 I | pkg/flags: recognized and used environment variable ETCD_NAME=apisix-1640071150-etcd-0
2021-12-21 10:30:35.271087 I | pkg/flags: recognized and used environment variable ETCD_PEER_AUTO_TLS=false
2021-12-21 10:30:35.271130 W | pkg/flags: unrecognized environment variable ETCD_TRUSTED_CA_FILE=
2021-12-21 10:30:35.271150 W | pkg/flags: unrecognized environment variable ETCD_ON_K8S=yes
2021-12-21 10:30:35.271162 W | pkg/flags: unrecognized environment variable ETCD_SNAPSHOTS_DIR=/snapshots
2021-12-21 10:30:35.271169 W | pkg/flags: unrecognized environment variable ETCD_BIN_DIR=/opt/bitnami/etcd/sbin
2021-12-21 10:30:35.271179 W | pkg/flags: unrecognized environment variable ETCD_VOLUME_DIR=/bitnami/etcd
2021-12-21 10:30:35.271187 W | pkg/flags: unrecognized environment variable ETCD_ROOT_PASSWORD=
2021-12-21 10:30:35.271192 W | pkg/flags: unrecognized environment variable ETCD_CLUSTER_DOMAIN=apisix-1640071150-etcd-headless.default.svc.cluster.local
2021-12-21 10:30:35.271196 W | pkg/flags: unrecognized environment variable ETCD_DISASTER_RECOVERY=no
2021-12-21 10:30:35.271201 W | pkg/flags: unrecognized environment variable ETCD_KEY_FILE=
2021-12-21 10:30:35.271208 W | pkg/flags: unrecognized environment variable ETCD_DAEMON_GROUP=etcd
2021-12-21 10:30:35.271212 W | pkg/flags: unrecognized environment variable ETCD_START_FROM_SNAPSHOT=no
2021-12-21 10:30:35.271219 W | pkg/flags: unrecognized environment variable ETCD_INIT_SNAPSHOT_FILENAME=
2021-12-21 10:30:35.271224 W | pkg/flags: unrecognized environment variable ETCD_INIT_SNAPSHOTS_DIR=/init-snapshot
2021-12-21 10:30:35.271231 W | pkg/flags: unrecognized environment variable ETCD_BASE_DIR=/opt/bitnami/etcd
2021-12-21 10:30:35.271240 W | pkg/flags: unrecognized environment variable ETCD_CERT_FILE=
2021-12-21 10:30:35.271244 W | pkg/flags: unrecognized environment variable ETCD_NEW_MEMBERS_ENV_FILE=/bitnami/etcd/data/new_member_envs
2021-12-21 10:30:35.271248 W | pkg/flags: unrecognized environment variable ETCD_DAEMON_USER=etcd
[WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
2021-12-21 10:30:35.271306 I | etcdmain: etcd Version: 3.4.16
2021-12-21 10:30:35.271315 I | etcdmain: Git SHA: d19fbe541
2021-12-21 10:30:35.271319 I | etcdmain: Go Version: go1.12.17
2021-12-21 10:30:35.271323 I | etcdmain: Go OS/Arch: linux/amd64
2021-12-21 10:30:35.271328 I | etcdmain: setting maximum number of CPUs to 2, total number of available CPUs is 2
2021-12-21 10:30:35.271388 W | etcdmain: found invalid file/dir new_member_envs under data dir /bitnami/etcd/data (Ignore this if you are upgrading etcd)
2021-12-21 10:30:35.271402 N | etcdmain: the server is already initialized as member before, starting as etcd member...
[WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
2021-12-21 10:30:35.271638 I | embed: name = apisix-1640071150-etcd-0
2021-12-21 10:30:35.271650 I | embed: data dir = /bitnami/etcd/data
2021-12-21 10:30:35.271655 I | embed: member dir = /bitnami/etcd/data/member
2021-12-21 10:30:35.271659 I | embed: heartbeat = 100ms
2021-12-21 10:30:35.271665 I | embed: election = 1000ms
2021-12-21 10:30:35.271669 I | embed: snapshot count = 100000
2021-12-21 10:30:35.271681 I | embed: advertise client URLs = http://apisix-1640071150-etcd-0.apisix-1640071150-etcd-headless.default.svc.cluster.local:2379
2021-12-21 10:30:35.274069 W | etcdserver: could not get cluster response from http://apisix-1640071150-etcd-1.apisix-1640071150-etcd-headless.default.svc.cluster.local:2380: Get http://apisix-1640071150-etcd-1.apisix-1640071150-etcd-headless.default.svc.cluster.local:2380/members: dial tcp 100.89.204.208:2380: connect: connection refused
2021-12-21 10:30:35.274939 W | etcdserver: could not get cluster response from http://apisix-1640071150-etcd-2.apisix-1640071150-etcd-headless.default.svc.cluster.local:2380: Get http://apisix-1640071150-etcd-2.apisix-1640071150-etcd-headless.default.svc.cluster.local:2380/members: dial tcp 100.116.37.80:2380: connect: connection refused
2021-12-21 10:30:35.275628 C | etcdmain: cannot fetch cluster info from peer urls: could not retrieve cluster information from the given URLs

tokers commented 2 years ago

I think this problem is not relevant to the APISIX, according to the error log, it's caused by the neworking issue, could you troubleshoot it firstly?

tokers commented 2 years ago

I think this problem is not relevant to the APISIX, according to the error log, it's caused by the neworking issue, could you troubleshoot it firstly?

Also, please provide more details about how you install APISIX and ETCD cluslter.

huangzx3 commented 2 years ago

I think this problem is not relevant to the APISIX, according to the error log, it's caused by the neworking issue, could you troubleshoot it firstly?

kubectl logs --namespace=kube-system -l k8s-app=kube-dns


.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d

huangzx3 commented 2 years ago

i create APISIX just

helm install apisix/apisix

I created pvc manually

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-apisix-1640139056-etcd-0
  namespace: default
  uid: 2a64b5a6-193d-468d-8215-d69ff68a05c9
  resourceVersion: '7222'
  creationTimestamp: '2021-12-22T02:11:05Z'
  labels:
    app.kubernetes.io/instance: apisix-1640139056
    app.kubernetes.io/name: etcd
  finalizers:
    - kubernetes.io/pvc-protection
  managedFields:
    - manager: kube-controller-manager
      operation: Update
      apiVersion: v1
      time: '2021-12-22T02:11:05Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:labels:
            .: {}
            f:app.kubernetes.io/instance: {}
            f:app.kubernetes.io/name: {}
        f:spec:
          f:accessModes: {}
          f:resources:
            f:requests:
              .: {}
              f:storage: {}
          f:volumeMode: {}
        f:status:
          f:phase: {}
  selfLink: >-
    /api/v1/namespaces/default/persistentvolumeclaims/data-apisix-1640139056-etcd-0
status:
  phase: Pending
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
  volumeMode: Filesystem
  storageClassName: XXXXXXXXXXXXXXXX

huangzx3 commented 2 years ago

This should have something to do with the startup sequence of etcd. They depend on each other, which makes it impossible to start. I can only start one.

kubectl get pod

NAME                                                   READY   STATUS             RESTARTS   AGE
apisix-1640139056-7858dfffcf-8dc64                     1/1     Running            0          57m
apisix-1640139056-etcd-0                               1/1     Running            0          3m25s

apache / apisix-helm-chart

etcd Back-off restarting failed container #198