Compaction or Snapshot Bug #15448

What happened?

Over the course of several minutes there are several elicited errors increasing in severity, until basically total crash.


"switched to configuraiton voters" log is the first log in over an hour. So I assume this is where the problem starts. Another error slightly after this one signifies the beginning of the end.


A new leader candidate has emerged. image

The end error signifies one pod spinning up. So here is the order over the course of the next 3 minutes. A B C are the existing ones. X Y Z are the new ones.

A B C are alive X comes on A dies Y comes on B dies Z comes on C dies

As far as which one is the leader, I don't know. This is an extremely basic setup from my understanding.

What did you expect to happen?

not crash

How can we reproduce it (as minimally and precisely as possible)?

providing helm information below

Anything else we need to know?

No response

Etcd version (please run commands below)

used helm chart. version is 3.5.4

Etcd configuration (command line flags or environment variables)

Helm chart values.


category: Database
apiVersion: v2
appVersion: 3.5.4
- name: common
- bitnami-common
version: 1.x.x
description: etcd is a distributed key-value store designed to securely store data
across a cluster. etcd is widely used in production on account of its reliability,
fault-tolerance and ease of use.
- etcd
- cluster
- database
- cache
- key-value
- name: Bitnami
name: etcd
version: 8.3.4

Etcd debug information

everything else seems standard.

## @param global.imageRegistry Global Docker image registry
## @param global.imagePullSecrets [array] Global Docker registry secret names as an array
## @param global.storageClass Global StorageClass for Persistent Volume(s)
  imageRegistry: ""
## @param kubeVersion Force target Kubernetes version (using Helm capabilities if not set)
kubeVersion: ""
## @param nameOverride String to partially override common.names.fullname template (will maintain the release name)
nameOverride: ""
## @param fullnameOverride String to fully override common.names.fullname template
fullnameOverride: ""
## @param commonLabels [object] Labels to add to all deployed objects
commonLabels: {}
## @param commonAnnotations [object] Annotations to add to all deployed objects
commonAnnotations: {}
## @param clusterDomain Default Kubernetes cluster domain
clusterDomain: cluster.local
## @param extraDeploy [array] Array of extra objects to deploy with the release
extraDeploy: []

## Bitnami etcd image version
## ref:
## @param image.registry etcd image registry
## @param image.repository etcd image name
## @param image.tag etcd image tag
## @param image.pullPolicy etcd image pull policy
## @param image.pullSecrets [array] etcd image pull secrets
## @param image.debug Enable image debug mode
  repository: bitnami/etcd
  tag: 3.5.4-debian-11-r14
## @param replicaCount Number of etcd replicas to deploy
replicaCount: 1
  ## @param service.type Kubernetes Service type
  type: ClusterIP
  ## @param service.enabled create second service if equal true
  enabled: true
  ## @param service.clusterIP Kubernetes service Cluster IP
  ## e.g.:
  ## clusterIP: None
  clusterIP: ""
  ## @param service.ports.client etcd client port
  ## @param service.ports.peer etcd peer port
    client: 2379
    peer: 2380
  ## @param service.nodePorts.client Specify the nodePort client value for the LoadBalancer and NodePort service types.
  ## @param service.nodePorts.peer Specify the nodePort peer value for the LoadBalancer and NodePort service types.
  ## ref:
    client: ""
    peer: ""
  ## @param service.clientPortNameOverride etcd client port name override
  clientPortNameOverride: ""
  ## @param service.peerPortNameOverride etcd peer port name override
  peerPortNameOverride: ""
  ## @param service.loadBalancerIP loadBalancerIP for the etcd service (optional, cloud specific)
  ## ref:
  loadBalancerIP: ""
  ## @param service.loadBalancerSourceRanges [array] Load Balancer source ranges
  ## ref:
  ## e.g:
  ## loadBalancerSourceRanges:
  ##   -
  loadBalancerSourceRanges: []
  ## @param service.externalIPs [array] External IPs
  ## ref:
  externalIPs: []
  ## @param service.externalTrafficPolicy %%MAIN_CONTAINER_NAME%% service external traffic policy
  ## ref
  externalTrafficPolicy: Cluster
  ## @param service.extraPorts Extra ports to expose (normally used with the `sidecar` value)
  extraPorts: []
  ## @param service.annotations [object] Additional annotations for the etcd service
  annotations: {}
  ## @param service.sessionAffinity Session Affinity for Kubernetes service, can be "None" or "ClientIP"
  ## If "ClientIP", consecutive client requests will be directed to the same Pod
  ## ref:
  sessionAffinity: None
  ## @param service.sessionAffinityConfig Additional settings for the sessionAffinity
  ## sessionAffinityConfig:
  ##   clientIP:
  ##     timeoutSeconds: 300
  sessionAffinityConfig: {}

## Enable persistence using Persistent Volume Claims
## ref:
  ## @param persistence.enabled If true, use a Persistent Volume Claim. If false, use emptyDir.
  enabled: true
  ## @param persistence.storageClass Persistent Volume Storage Class
  ## If defined, storageClassName: <storageClass>
  ## If set to "-", storageClassName: "", which disables dynamic provisioning
  ## If undefined (the default) or set to null, no storageClassName spec is
  ##   set, choosing the default provisioner.  (gp2 on AWS, standard on
  ##   GKE, AWS & OpenStack)
  storageClass: ""
  ## @param persistence.annotations [object] Annotations for the PVC
  annotations: {}
  ## @param persistence.accessModes Persistent Volume Access Modes
    - ReadWriteOnce
  ## @param persistence.size PVC Storage Request for etcd data volume
  size: 8Gi
  ## @param persistence.selector [object] Selector to match an existing Persistent Volume
  ## ref:
  selector: {}

## Init containers parameters:
## volumePermissions: Change the owner and group of the persistent volume mountpoint to runAsUser:fsGroup values from the securityContext section.
  ## @param volumePermissions.enabled Enable init container that changes the owner and group of the persistent volume(s) mountpoint to `runAsUser:fsGroup`
  enabled: false
  ## @param volumePermissions.image.registry Init container volume-permissions image registry
  ## @param volumePermissions.image.repository Init container volume-permissions image name
  ## @param volumePermissions.image.tag Init container volume-permissions image tag
  ## @param volumePermissions.image.pullPolicy Init container volume-permissions image pull policy
  ## @param volumePermissions.image.pullSecrets [array] Specify docker-registry secret names as an array
    repository: bitnami/bitnami-shell
    tag: 11-debian-11-r14
    pullPolicy: IfNotPresent
    ## Optionally specify an array of imagePullSecrets.
    ## Secrets must be manually created in the namespace.
    ## ref:
    ## e.g:
    ## pullSecrets:
    ##   - myRegistryKeySecretName
    pullSecrets: []
  ## Init container' resource requests and limits
  ## ref:
  ## We usually recommend not to specify default resources and to leave this as a conscious
  ## choice for the user. This also increases chances charts run on environments with little
  ## resources, such as Minikube. If you do want to specify resources, uncomment the following
  ## lines, adjust them as necessary, and remove the curly braces after 'resources:'.
  ## @param volumePermissions.resources.limits [object] Init container volume-permissions resource  limits
  ## @param volumePermissions.resources.requests [object] Init container volume-permissions resource  requests
    ## Example:
    ## limits:
    ##    cpu: 500m
    ##    memory: 1Gi
    limits: {}
    requests: {}

  ## @param metrics.enabled Expose etcd metrics
  enabled: false
  ## @param metrics.podAnnotations [object] Annotations for the Prometheus metrics on etcd pods
  podAnnotations: "true" "{{ .Values.containerPorts.client }}"
  ## Prometheus Service Monitor
  ## ref:
    ## @param metrics.podMonitor.enabled Create PodMonitor Resource for scraping metrics using PrometheusOperator
    enabled: false
    ## @param metrics.podMonitor.namespace Namespace in which Prometheus is running
    namespace: monitoring
    ## @param metrics.podMonitor.interval Specify the interval at which metrics should be scraped
    interval: 30s
    ## @param metrics.podMonitor.scrapeTimeout Specify the timeout after which the scrape is ended
    scrapeTimeout: 30s
    ## @param metrics.podMonitor.additionalLabels [object] Additional labels that can be used so PodMonitors will be discovered by Prometheus
    ## ref:
    additionalLabels: {}
    ## @param metrics.podMonitor.scheme Scheme to use for scraping
    scheme: http
    ## @param metrics.podMonitor.tlsConfig [object] TLS configuration used for scrape endpoints used by Prometheus
    ## ref:
    ## e.g:
    ## tlsConfig:
    ##   ca:
    ##     secret:
    ##       name: existingSecretName
    tlsConfig: {}
    ## @param metrics.podMonitor.relabelings [array] Prometheus relabeling rules
    relabelings: []

  ## Prometheus Operator PrometheusRule configuration
    ## @param metrics.prometheusRule.enabled Create a Prometheus Operator PrometheusRule (also requires `metrics.enabled` to be `true` and `metrics.prometheusRule.rules`)
    enabled: false
    ## @param metrics.prometheusRule.namespace Namespace for the PrometheusRule Resource (defaults to the Release Namespace)
    namespace: ""
    ## @param metrics.prometheusRule.additionalLabels Additional labels that can be used so PrometheusRule will be discovered by Prometheus
    additionalLabels: {}
    ## @param metrics.prometheusRule.rules Prometheus Rule definitions
      # - alert: ETCD has no leader
      #   annotations:
      #     summary: "ETCD has no leader"
      #     description: "pod {{`{{`}} $labels.pod {{`}}`}} state error, can't connect leader"
      #   for: 1m
      #   expr: etcd_server_has_leader == 0
      #   labels:
      #     severity: critical
      #     group: PaaS
    rules: []

## Start a new etcd cluster recovering the data from an existing snapshot before bootstrapping
  ## @param startFromSnapshot.enabled Initialize new cluster recovering an existing snapshot
  enabled: false
  ## @param startFromSnapshot.existingClaim Existing PVC containing the etcd snapshot
  existingClaim: ""
  ## @param startFromSnapshot.snapshotFilename Snapshot filename
  snapshotFilename: ""
## Enable auto disaster recovery by periodically snapshotting the keyspace:
## - It creates a cronjob to periodically snapshotting the keyspace
## - It also creates a ReadWriteMany PVC to store the snapshots
## If the cluster permanently loses more than (N-1)/2 members, it tries to
## recover itself from the last available snapshot.
  ## @param disasterRecovery.enabled Enable auto disaster recovery by periodically snapshotting the keyspace
  enabled: false
    ## @param disasterRecovery.cronjob.schedule Schedule in Cron format to save snapshots
    ## See
    schedule: "*/30 * * * *"
    ## @param disasterRecovery.cronjob.historyLimit Number of successful finished jobs to retain
    historyLimit: 1
    ## @param disasterRecovery.cronjob.snapshotHistoryLimit Number of etcd snapshots to retain, tagged by date
    snapshotHistoryLimit: 1
    ## @param disasterRecovery.cronjob.podAnnotations [object] Pod annotations for cronjob pods
    ## ref:
    podAnnotations: {}
    ## Configure resource requests and limits for snapshotter containers
    ## ref:
    ## We usually recommend not to specify default resources and to leave this as a conscious
    ## choice for the user. This also increases chances charts run on environments with little
    ## resources, such as Minikube. If you do want to specify resources, uncomment the following
    ## lines, adjust them as necessary, and remove the curly braces after 'resources:'.
    ## @param disasterRecovery.cronjob.resources.limits [object] Cronjob container resource limits
    ## @param disasterRecovery.cronjob.resources.requests [object] Cronjob container resource requests
      ## Example:
      ## limits:
      ##    cpu: 500m
      ##    memory: 1Gi
      limits: {}
      requests: {}

    ## @param disasterRecovery.cronjob.nodeSelector Node labels for cronjob pods assignment
    ## Ref:
    nodeSelector: {}
    ## @param disasterRecovery.cronjob.tolerations Tolerations for cronjob pods assignment
    ## Ref:
    tolerations: []

    ## @param disasterRecovery.pvc.existingClaim A manually managed Persistent Volume and Claim
    ## If defined, PVC must be created manually before volume will be bound
    ## The value is evaluated as a template, so, for example, the name can depend on .Release or .Chart
    existingClaim: ""
    ## @param disasterRecovery.pvc.size PVC Storage Request
    size: 2Gi
    ## @param disasterRecovery.pvc.storageClassName Storage Class for snapshots volume
    storageClassName: nfs

  ## @param serviceAccount.create Enable/disable service account creation
  create: false
  ## @param Name of the service account to create or use
  name: ""
  ## @param serviceAccount.automountServiceAccountToken Enable/disable auto mounting of service account token
  ## ref:
  automountServiceAccountToken: true
  ## @param serviceAccount.annotations [object] Additional annotations to be included on the service account
  annotations: {}
  ## @param serviceAccount.labels [object] Additional labels to be included on the service account
  labels: {}

## etcd Pod Disruption Budget configuration
## ref:
  ## @param pdb.create Enable/disable a Pod Disruption Budget creation
  create: true
  ## @param pdb.minAvailable Minimum number/percentage of pods that should remain scheduled
  minAvailable: 51%
  ## @param pdb.maxUnavailable Maximum number/percentage of pods that may be made unavailable
  maxUnavailable: ""

Here are the overrides.

  # -- install etcd(v3) by default, set false if do not want to install etcd(v3) together
  enabled: true
  # -- if etcd.enabled is false, use external etcd, support multiple address, if your etcd cluster enables TLS, please use https scheme, e.g.
    # host or ip e.g.
  # -- apisix configurations prefix
  prefix: "/apisix"
  # -- Set the timeout value in seconds for subsequent socket operations from apisix to etcd cluster
  timeout: 30

  # -- if etcd.enabled is true, set more values of bitnami/etcd helm chart
      # -- No authentication by default. Switch to enable RBAC authentication
      create: false
      # -- root username for etcd
      user: ""
      # -- root password for etcd
      password: ""
      # -- enable etcd client certificate
      enabled: false
      # -- name of the secret contains etcd client cert
      existingSecret: ""
      # -- etcd client cert filename using in etcd.auth.tls.existingSecret
      certFilename: ""
      # -- etcd client cert key filename using in etcd.auth.tls.existingSecret
      certKeyFilename: ""
      # -- whether to verify the etcd endpoint certificate when setup a TLS connection to etcd
      verify: true
      # -- specify the TLS Server Name Indication extension, the ETCD endpoint hostname will be used when this setting is unset.
      sni: ""

    port: 2379

  replicaCount: 3
Hi @coffeebe4code Thanks for the report

Could you please clarify why the error log is related to compaction or snapshot bug or just describe the failure symptoms from your point of review?

From the pasted log, f9fd was the leader and removed from the cluster membership. And f06b and 7c24 elected f06b as the new leader. This behavior looks working as expected.

ahrtr commented 1 year ago

I don't see any etcd issue, and there are lots of unrelated info.

I would suggest to raise an issue in Bitnami (?) community and triage this issue there firstly.

Please feel free to raise an issue with etcd related log and configuration if you see any etcd issue.

coffeebe4code commented 1 year ago

I'm sorry, but I don't understand about pushing this to bitnami. There should be nothing that causes etcd to cycle itself every other day.

I believe I have adequately described the failure symptoms, and provided logs.

Here is the part that leads me to challenge the premise of it not being etcd

If there was a true connection lost amongst the peers, or an issue with configuration (which i would still want your help), it wouldn't cycle all 3 pods so methodically, and provide ample log info and warnings for several minutes up until the cycle.

If you need more information please let me know what information it is and how I can attain it.

ahrtr commented 1 year ago

Please feel free to raise a new issue with complete etcd logs instead of just a couple of screen shots.