apache / apisix-helm-chart

Apache APISIX Helm Chart
https://apisix.apache.org/
Apache License 2.0
231 stars 208 forks source link

etcd is keep crashing when upgrading chart to 0.6.0 #156

Open youngwookim opened 3 years ago

youngwookim commented 3 years ago

I've been running apisix chart version 0.4.0 and now I am upgrading the chart to 0.6.0.

I got an error from etcd pod like following messages:

etcd 03:39:22.52 
etcd 03:39:22.52 Welcome to the Bitnami etcd container
etcd 03:39:22.52 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-etcd
etcd 03:39:22.52 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-etcd/issues
etcd 03:39:22.52 
etcd 03:39:22.52 INFO  ==> ** Starting etcd setup **
etcd 03:39:22.53 INFO  ==> Validating settings in ETCD_* env vars..
etcd 03:39:22.53 WARN  ==> You set the environment variable ALLOW_NONE_AUTHENTICATION=yes. For safety reasons, do not use this flag in a production environment.
etcd 03:39:22.54 INFO  ==> Initializing etcd
etcd 03:39:22.54 INFO  ==> Detected data from previous deployments
etcd 03:39:32.72 INFO  ==> Updating member in existing cluster
Error: bad member ID arg (strconv.ParseUint: parsing "": invalid syntax), expecting ID in Hex

The etcd pod is keep crashing.

tokers commented 3 years ago

@youngwookim Could you provide the reproduce steps? Also, it seems that the ETCD data was corrupted, the member id is invalid.

youngwookim commented 3 years ago

Thanks for the comment @tokers

Following is a command that I have done for upgrading chart version 0.4.0 to 0.6.0:

$ helm upgrade --install apisix apisix/apisix  --namespace apisix --version 0.6.0 \
    --set apisix.replicaCount=1 \
    --set gateway.type=LoadBalancer \
    --set gateway.loadBalancerIP="......" \
    --set gateway.tls.enabled=true \
    --set dashboard.enabled=true \
    --set ingress-controller.enabled=true \
    --set allow.ipList=""

After upgrading, a pod of statefulset 'apisix-etcd' is keep crashing:

$ kubectl describe -n apisix pod/apisix-etcd-2
Name:         apisix-etcd-2
Namespace:    apisix
Priority:     0
Node:         aks-defaultpool-17674265-vmss000002/10.240.0.16
Start Time:   Tue, 12 Oct 2021 12:31:52 +0900
Labels:       app.kubernetes.io/instance=apisix
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=etcd
              controller-revision-hash=apisix-etcd-6579c5cbc8
              helm.sh/chart=etcd-6.2.6
              statefulset.kubernetes.io/pod-name=apisix-etcd-2
Annotations:  cni.projectcalico.org/containerID: 25fa15ee0266d8c17d213f20bc16053477deab49f0e6e9ea82ed4867a14cbf03
              cni.projectcalico.org/podIP: 10.244.2.133/32
              cni.projectcalico.org/podIPs: 10.244.2.133/32
Status:       Running
IP:           10.244.2.133
IPs:
  IP:           10.244.2.133
Controlled By:  StatefulSet/apisix-etcd
Containers:
  etcd:
    Container ID:   containerd://e5cd7024f5a7158474800be5012ab149e6237086699e4512cc13af3326c1cb12
    Image:          docker.io/bitnami/etcd:3.4.16-debian-10-r14
    Image ID:       docker.io/bitnami/etcd@sha256:ef2d499749c634588f7d281dd70cc1fb2514d57f6d42308c0fb0f2c8ca55bea4
    Ports:          2379/TCP, 2380/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    128
      Started:      Tue, 12 Oct 2021 22:41:29 +0900
      Finished:     Tue, 12 Oct 2021 22:41:44 +0900
    Ready:          False
    Restart Count:  118
    Liveness:       exec [/opt/bitnami/scripts/etcd/healthcheck.sh] delay=60s timeout=5s period=30s #success=1 #failure=5
    Readiness:      exec [/opt/bitnami/scripts/etcd/healthcheck.sh] delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:
      BITNAMI_DEBUG:                     false
      MY_POD_IP:                          (v1:status.podIP)
      MY_POD_NAME:                       apisix-etcd-2 (v1:metadata.name)
      ETCDCTL_API:                       3
      ETCD_ON_K8S:                       yes
      ETCD_START_FROM_SNAPSHOT:          no
      ETCD_DISASTER_RECOVERY:            no
      ETCD_NAME:                         $(MY_POD_NAME)
      ETCD_DATA_DIR:                     /bitnami/etcd/data
      ETCD_LOG_LEVEL:                    info
      ALLOW_NONE_AUTHENTICATION:         yes
      ETCD_ADVERTISE_CLIENT_URLS:        http://$(MY_POD_NAME).apisix-etcd-headless.apisix.svc.cluster.local:2379
      ETCD_LISTEN_CLIENT_URLS:           http://0.0.0.0:2379
      ETCD_INITIAL_ADVERTISE_PEER_URLS:  http://$(MY_POD_NAME).apisix-etcd-headless.apisix.svc.cluster.local:2380
      ETCD_LISTEN_PEER_URLS:             http://0.0.0.0:2380
      ETCD_INITIAL_CLUSTER_TOKEN:        etcd-cluster-k8s
      ETCD_INITIAL_CLUSTER_STATE:        existing
      ETCD_INITIAL_CLUSTER:              apisix-etcd-0=http://apisix-etcd-0.apisix-etcd-headless.apisix.svc.cluster.local:2380,apisix-etcd-1=http://apisix-etcd-1.apisix-etcd-headless.apisix.svc.cluster.local:2380,apisix-etcd-2=http://apisix-etcd-2.apisix-etcd-headless.apisix.svc.cluster.local:2380
      ETCD_CLUSTER_DOMAIN:               apisix-etcd-headless.apisix.svc.cluster.local
    Mounts:
      /bitnami/etcd from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rlxf5 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-apisix-etcd-2
    ReadOnly:   false
  kube-api-access-rlxf5:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason   Age                     From     Message
  ----     ------   ----                    ----     -------
  Warning  BackOff  3m45s (x2732 over 10h)  kubelet  Back-off restarting failed container
tokers commented 3 years ago

@youngwookim Strange, could you try to roll back this upgrade and see whether the ETCD cluster recoveries from this fault? Also, we may have to get insight into the ETCD data so that we can make sure whether the data is integral.

youngwookim commented 3 years ago

@tokers

Rollback to 0.4.0 chart:

$ helm list -n apisix
NAME    NAMESPACE   REVISION    UPDATED                                 STATUS      CHART           APP VERSION
apisix  apisix      2           2021-10-12 12:26:19.228653 +0900 KST    deployed    apisix-0.6.0    2.10.0     
[ywkim: ~]$ helm rollback -n apisix apisix
Rollback was a success! Happy Helming!
[ywkim: ~]$ helm list -n apisix
NAME    NAMESPACE   REVISION    UPDATED                                 STATUS      CHART           APP VERSION
apisix  apisix      3           2021-10-13 22:18:23.102311 +0900 KST    deployed    apisix-0.4.0    2.7.0      

No difference, I got same message from pod:

etcd 13:23:13.58 
etcd 13:23:13.58 Welcome to the Bitnami etcd container
etcd 13:23:13.58 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-etcd
etcd 13:23:13.58 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-etcd/issues
etcd 13:23:13.58 
etcd 13:23:13.58 INFO  ==> ** Starting etcd setup **
etcd 13:23:13.59 INFO  ==> Validating settings in ETCD_* env vars..
etcd 13:23:13.60 WARN  ==> You set the environment variable ALLOW_NONE_AUTHENTICATION=yes. For safety reasons, do not use this flag in a production environment.
etcd 13:23:13.60 INFO  ==> Initializing etcd
etcd 13:23:13.60 INFO  ==> Detected data from previous deployments
etcd 13:23:23.86 INFO  ==> Updating member in existing cluster
Error: bad member ID arg (strconv.ParseUint: parsing "": invalid syntax), expecting ID in Hex
tokers commented 3 years ago

@tokers

Rollback to 0.4.0 chart:

$ helm list -n apisix
NAME      NAMESPACE   REVISION    UPDATED                                 STATUS      CHART           APP VERSION
apisix    apisix      2           2021-10-12 12:26:19.228653 +0900 KST    deployed    apisix-0.6.0    2.10.0     
[ywkim: ~]$ helm rollback -n apisix apisix
Rollback was a success! Happy Helming!
[ywkim: ~]$ helm list -n apisix
NAME      NAMESPACE   REVISION    UPDATED                                 STATUS      CHART           APP VERSION
apisix    apisix      3           2021-10-13 22:18:23.102311 +0900 KST    deployed    apisix-0.4.0    2.7.0      

No difference, I got same message from pod:

etcd 13:23:13.58 
etcd 13:23:13.58 Welcome to the Bitnami etcd container
etcd 13:23:13.58 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-etcd
etcd 13:23:13.58 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-etcd/issues
etcd 13:23:13.58 
etcd 13:23:13.58 INFO  ==> ** Starting etcd setup **
etcd 13:23:13.59 INFO  ==> Validating settings in ETCD_* env vars..
etcd 13:23:13.60 WARN  ==> You set the environment variable ALLOW_NONE_AUTHENTICATION=yes. For safety reasons, do not use this flag in a production environment.
etcd 13:23:13.60 INFO  ==> Initializing etcd
etcd 13:23:13.60 INFO  ==> Detected data from previous deployments
etcd 13:23:23.86 INFO  ==> Updating member in existing cluster
Error: bad member ID arg (strconv.ParseUint: parsing "": invalid syntax), expecting ID in Hex

OK, I think now we can assert that this is irrelevant to the helm chart version but the ETCD data.

tokers commented 3 years ago

How many instances are in your ETCD cluster? maybe we can backup the data from the healthy node and use it in the bad one.

youngwookim commented 3 years ago

Thanks for your advice @tokers I'll try to do that.

jishaashokan commented 1 year ago

Have the same issue with the etcd crashing in my EKS cluster. Have tried deleting the PVC and reinstalling apisix via helm multiple times, however, etcd keeps crashing.

`mk describe pod/apisix-etcd-2 Name: apisix-etcd-2 Namespace: ingress-apisix Priority: 0 Node: ip-172-31-109-173.ap-south-1.compute.internal/172.31.109.173 Start Time: Mon, 09 Jan 2023 09:14:01 +0530 Labels: app.kubernetes.io/instance=apisix app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=etcd controller-revision-hash=apisix-etcd-54d7f4b448 helm.sh/chart=etcd-8.3.4 statefulset.kubernetes.io/pod-name=apisix-etcd-2 Annotations: checksum/token-secret: 7f28bd39a9649b0425cff5da399d496cfadede8bb85303b4edf589da3bc2e751 kubernetes.io/psp: eks.privileged Status: Running IP: 172.31.111.200 IPs: IP: 172.31.111.200 Controlled By: StatefulSet/apisix-etcd Containers: etcd: Container ID: docker://35e9a0fa4f62390a84a04e7c1a3761af83595a78e3dfdf9b008f944ce044d152 Image: docker.io/bitnami/etcd:3.5.4-debian-11-r14 Image ID: docker-pullable://bitnami/etcd@sha256:e5ef30fa508e6f3a028a4e26acc7ec2803eea1370dc9c1da692ee0405cdaf50d Ports: 2379/TCP, 2380/TCP Host Ports: 0/TCP, 0/TCP State: Running Started: Mon, 09 Jan 2023 09:14:07 +0530 Ready: True Restart Count: 0 Liveness: exec [/opt/bitnami/scripts/etcd/healthcheck.sh] delay=60s timeout=5s period=30s #success=1 #failure=5 Readiness: exec [/opt/bitnami/scripts/etcd/healthcheck.sh] delay=60s timeout=5s period=10s #success=1 #failure=5 Environment: BITNAMI_DEBUG: false MY_POD_IP: (v1:status.podIP) MY_POD_NAME: apisix-etcd-2 (v1:metadata.name) MY_STS_NAME: apisix-etcd ETCDCTL_API: 3 ETCD_ON_K8S: yes ETCD_START_FROM_SNAPSHOT: no ETCD_DISASTER_RECOVERY: no ETCD_NAME: $(MY_POD_NAME) ETCD_DATA_DIR: /bitnami/etcd/data ETCD_LOG_LEVEL: info ALLOW_NONE_AUTHENTICATION: yes ETCD_AUTH_TOKEN: jwt,priv-key=/opt/bitnami/etcd/certs/token/jwt-token.pem,sign-method=RS256,ttl=10m ETCD_ADVERTISE_CLIENT_URLS: http://$(MY_POD_NAME).apisix-etcd-headless.ingress-apisix.svc.cluster.local:2379,http://apisix-etcd.ingress-apisix.svc.cluster.local:2379 ETCD_LISTEN_CLIENT_URLS: http://0.0.0.0:2379 ETCD_INITIAL_ADVERTISE_PEER_URLS: http://$(MY_POD_NAME).apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380 ETCD_LISTEN_PEER_URLS: http://0.0.0.0:2380 ETCD_INITIAL_CLUSTER_TOKEN: etcd-cluster-k8s ETCD_INITIAL_CLUSTER_STATE: existing ETCD_INITIAL_CLUSTER: apisix-etcd-0=http://apisix-etcd-0.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380,apisix-etcd-1=http://apisix-etcd-1.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380,apisix-etcd-2=http://apisix-etcd-2.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380 ETCD_CLUSTER_DOMAIN: apisix-etcd-headless.ingress-apisix.svc.cluster.local Mounts: /bitnami/etcd from data (rw) /opt/bitnami/etcd/certs/token/ from etcd-jwt-token (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9hfvq (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: data: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: data-apisix-etcd-2 ReadOnly: false etcd-jwt-token: Type: Secret (a volume populated by a Secret) SecretName: apisix-etcd-jwt-token Optional: false kube-api-access-9hfvq: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message


Normal Scheduled 31m default-scheduler Successfully assigned ingress-apisix/apisix-etcd-2 to ip-172-31-109-173.ap-south-1.compute.internal Normal Pulled 31m kubelet Container image "docker.io/bitnami/etcd:3.5.4-debian-11-r14" already present on machine Normal Created 31m kubelet Created container etcd Normal Started 31m kubelet Started container etcd`

`$ mk describe pod/apisix-etcd-1 Name: apisix-etcd-1 Namespace: ingress-apisix Priority: 0 Node: ip-172-31-102-32.ap-south-1.compute.internal/172.31.102.32 Start Time: Mon, 09 Jan 2023 09:15:50 +0530 Labels: app.kubernetes.io/instance=apisix app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=etcd controller-revision-hash=apisix-etcd-54d7f4b448 helm.sh/chart=etcd-8.3.4 statefulset.kubernetes.io/pod-name=apisix-etcd-1 Annotations: checksum/token-secret: 7f28bd39a9649b0425cff5da399d496cfadede8bb85303b4edf589da3bc2e751 kubernetes.io/psp: eks.privileged Status: Running IP: 172.31.100.113 IPs: IP: 172.31.100.113 Controlled By: StatefulSet/apisix-etcd Containers: etcd: Container ID: docker://759dba6c28cbcba2e077201656aede69868b4562dfc44398091c3c47a6a6a47f Image: docker.io/bitnami/etcd:3.5.4-debian-11-r14 Image ID: docker-pullable://bitnami/etcd@sha256:e5ef30fa508e6f3a028a4e26acc7ec2803eea1370dc9c1da692ee0405cdaf50d Ports: 2379/TCP, 2380/TCP Host Ports: 0/TCP, 0/TCP State: Running Started: Mon, 09 Jan 2023 09:43:55 +0530 Last State: Terminated Reason: Error Exit Code: 137 Started: Mon, 09 Jan 2023 09:39:55 +0530 Finished: Mon, 09 Jan 2023 09:43:55 +0530 Ready: False Restart Count: 7 Liveness: exec [/opt/bitnami/scripts/etcd/healthcheck.sh] delay=60s timeout=5s period=30s #success=1 #failure=5 Readiness: exec [/opt/bitnami/scripts/etcd/healthcheck.sh] delay=60s timeout=5s period=10s #success=1 #failure=5 Environment: BITNAMI_DEBUG: false MY_POD_IP: (v1:status.podIP) MY_POD_NAME: apisix-etcd-1 (v1:metadata.name) MY_STS_NAME: apisix-etcd ETCDCTL_API: 3 ETCD_ON_K8S: yes ETCD_START_FROM_SNAPSHOT: no ETCD_DISASTER_RECOVERY: no ETCD_NAME: $(MY_POD_NAME) ETCD_DATA_DIR: /bitnami/etcd/data ETCD_LOG_LEVEL: info ALLOW_NONE_AUTHENTICATION: yes ETCD_AUTH_TOKEN: jwt,priv-key=/opt/bitnami/etcd/certs/token/jwt-token.pem,sign-method=RS256,ttl=10m ETCD_ADVERTISE_CLIENT_URLS: http://$(MY_POD_NAME).apisix-etcd-headless.ingress-apisix.svc.cluster.local:2379,http://apisix-etcd.ingress-apisix.svc.cluster.local:2379 ETCD_LISTEN_CLIENT_URLS: http://0.0.0.0:2379 ETCD_INITIAL_ADVERTISE_PEER_URLS: http://$(MY_POD_NAME).apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380 ETCD_LISTEN_PEER_URLS: http://0.0.0.0:2380 ETCD_INITIAL_CLUSTER_TOKEN: etcd-cluster-k8s ETCD_INITIAL_CLUSTER_STATE: existing ETCD_INITIAL_CLUSTER: apisix-etcd-0=http://apisix-etcd-0.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380,apisix-etcd-1=http://apisix-etcd-1.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380,apisix-etcd-2=http://apisix-etcd-2.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380 ETCD_CLUSTER_DOMAIN: apisix-etcd-headless.ingress-apisix.svc.cluster.local Mounts: /bitnami/etcd from data (rw) /opt/bitnami/etcd/certs/token/ from etcd-jwt-token (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vcq7m (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: data: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: data-apisix-etcd-1 ReadOnly: false etcd-jwt-token: Type: Secret (a volume populated by a Secret) SecretName: apisix-etcd-jwt-token Optional: false kube-api-access-vcq7m: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message


Normal Scheduled 30m default-scheduler Successfully assigned ingress-apisix/apisix-etcd-1 to ip-172-31-102-32.ap-south-1.compute.internal Normal SuccessfulAttachVolume 30m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-cd13081f-9732-4e12-b8dc-a0d7c8fad6dd" Normal Pulled 30m kubelet Container image "docker.io/bitnami/etcd:3.5.4-debian-11-r14" already present on machine Normal Created 30m kubelet Created container etcd Normal Started 30m kubelet Started container etcd Warning Unhealthy 27m (x5 over 29m) kubelet Liveness probe failed: Normal Killing 27m kubelet Container etcd failed liveness probe, will be restarted Warning FailedPreStopHook 27m kubelet Exec lifecycle hook ([/opt/bitnami/scripts/etcd/prestop.sh]) for Container "etcd" in Pod "apisix-etcd-1_ingress-apisix(724e8425-d8ed-4634-81fc-c4a88b1c67da)" failed - error: command '/opt/bitnami/scripts/etcd/prestop.sh' exited with 128: Error: bad member ID arg (strconv.ParseUint: parsing "": invalid syntax), expecting ID in Hex , message: "Error: bad member ID arg (strconv.ParseUint: parsing \"\": invalid syntax), expecting ID in Hex\n" Warning Unhealthy 31s (x133 over 29m) kubelet Readiness probe failed: `

`$ mk describe pod/apisix-etcd-0 Name: apisix-etcd-0 Namespace: ingress-apisix Priority: 0 Node: ip-172-31-110-110.ap-south-1.compute.internal/172.31.110.110 Start Time: Mon, 09 Jan 2023 09:12:12 +0530 Labels: app.kubernetes.io/instance=apisix app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=etcd controller-revision-hash=apisix-etcd-5c44c455f7 helm.sh/chart=etcd-8.3.4 statefulset.kubernetes.io/pod-name=apisix-etcd-0 Annotations: checksum/token-secret: b04b8e4c76df91b28aa60b5607a9ced3974beacab02358b3c9278e707a332628 kubernetes.io/psp: eks.privileged Status: Running IP: 172.31.110.252 IPs: IP: 172.31.110.252 Controlled By: StatefulSet/apisix-etcd Containers: etcd: Container ID: docker://13366b4b07151a564ebb77bb52a05993b3f8a197d16a583a73c573233fb45c1d Image: docker.io/bitnami/etcd:3.5.4-debian-11-r14 Image ID: docker-pullable://bitnami/etcd@sha256:e5ef30fa508e6f3a028a4e26acc7ec2803eea1370dc9c1da692ee0405cdaf50d Ports: 2379/TCP, 2380/TCP Host Ports: 0/TCP, 0/TCP State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 137 Started: Mon, 09 Jan 2023 09:40:17 +0530 Finished: Mon, 09 Jan 2023 09:44:17 +0530 Ready: False Restart Count: 7 Liveness: exec [/opt/bitnami/scripts/etcd/healthcheck.sh] delay=60s timeout=5s period=30s #success=1 #failure=5 Readiness: exec [/opt/bitnami/scripts/etcd/healthcheck.sh] delay=60s timeout=5s period=10s #success=1 #failure=5 Environment: BITNAMI_DEBUG: false MY_POD_IP: (v1:status.podIP) MY_POD_NAME: apisix-etcd-0 (v1:metadata.name) MY_STS_NAME: apisix-etcd ETCDCTL_API: 3 ETCD_ON_K8S: yes ETCD_START_FROM_SNAPSHOT: no ETCD_DISASTER_RECOVERY: no ETCD_NAME: $(MY_POD_NAME) ETCD_DATA_DIR: /bitnami/etcd/data ETCD_LOG_LEVEL: info ALLOW_NONE_AUTHENTICATION: yes ETCD_AUTH_TOKEN: jwt,priv-key=/opt/bitnami/etcd/certs/token/jwt-token.pem,sign-method=RS256,ttl=10m ETCD_ADVERTISE_CLIENT_URLS: http://$(MY_POD_NAME).apisix-etcd-headless.ingress-apisix.svc.cluster.local:2379,http://apisix-etcd.ingress-apisix.svc.cluster.local:2379 ETCD_LISTEN_CLIENT_URLS: http://0.0.0.0:2379 ETCD_INITIAL_ADVERTISE_PEER_URLS: http://$(MY_POD_NAME).apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380 ETCD_LISTEN_PEER_URLS: http://0.0.0.0:2380 ETCD_INITIAL_CLUSTER_TOKEN: etcd-cluster-k8s ETCD_INITIAL_CLUSTER_STATE: existing ETCD_INITIAL_CLUSTER: apisix-etcd-0=http://apisix-etcd-0.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380,apisix-etcd-1=http://apisix-etcd-1.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380,apisix-etcd-2=http://apisix-etcd-2.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380 ETCD_CLUSTER_DOMAIN: apisix-etcd-headless.ingress-apisix.svc.cluster.local Mounts: /bitnami/etcd from data (rw) /opt/bitnami/etcd/certs/token/ from etcd-jwt-token (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-cwxz2 (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: data: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: data-apisix-etcd-0 ReadOnly: false etcd-jwt-token: Type: Secret (a volume populated by a Secret) SecretName: apisix-etcd-jwt-token Optional: false kube-api-access-cwxz2: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message


Normal Scheduled 34m default-scheduler Successfully assigned ingress-apisix/apisix-etcd-0 to ip-172-31-110-110.ap-south-1.compute.internal Normal SuccessfulAttachVolume 34m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-09001332-85bf-405a-a4e2-8579d00d9a5b" Normal Pulled 34m kubelet Container image "docker.io/bitnami/etcd:3.5.4-debian-11-r14" already present on machine Normal Created 34m kubelet Created container etcd Normal Started 34m kubelet Started container etcd Warning Unhealthy 30m (x5 over 32m) kubelet Liveness probe failed: Normal Killing 30m kubelet Container etcd failed liveness probe, will be restarted Warning FailedPreStopHook 30m kubelet Exec lifecycle hook ([/opt/bitnami/scripts/etcd/prestop.sh]) for Container "etcd" in Pod "apisix-etcd-0_ingress-apisix(237e1554-1e0c-438b-be98-761988ee70f0)" failed - error: command '/opt/bitnami/scripts/etcd/prestop.sh' exited with 128: Error: bad member ID arg (strconv.ParseUint: parsing "": invalid syntax), expecting ID in Hex , message: "Error: bad member ID arg (strconv.ParseUint: parsing \"\": invalid syntax), expecting ID in Hex\n" Warning Unhealthy 4m16s (x133 over 33m) kubelet Readiness probe failed: `

tokers commented 1 year ago

Warning FailedPreStopHook 30m kubelet Exec lifecycle hook ([/opt/bitnami/scripts/etcd/prestop.sh]) for Container "etcd" in Pod "apisix-etcd-0_ingress-apisix(237e1554-1e0c-438b-be98-761988ee70f0)" failed - error: command '/opt/bitnami/scripts/etcd/prestop.sh' exited with 128: Error: bad member ID arg (strconv.ParseUint: parsing "": invalid syntax), expecting ID in Hex , message: "Error: bad member ID arg (strconv.ParseUint: parsing "": invalid syntax), expecting ID in Hex\n" Warning Unhealthy 4m16s (x133 over 33m) kubelet Readiness probe failed:

In my opinion this is not related to APISIX. This might be due to the bitnami image problem.

airtonzanon commented 1 month ago

I've made a change on my values.yaml on the etcd replica count. I changed it to 1, so it doesn't have problems with other pods waiting, neither a problem with the headless error.

I think this issue might be on the way that I am using nfs csi, but I didn't test with other configurations yet.