bitnami / charts

Bitnami Helm Charts
https://bitnami.com
Other
8.85k stars 9.13k forks source link

[bitnami/cassandra] Readiness and Liveness checks fail after rollout restart with Connection Refused error #8202

Closed willfindlay closed 2 years ago

willfindlay commented 2 years ago

Which chart: cassandra-9.0.4

Describe the bug After deploying Cassandra with a replication count of 2 and running kubectl rollout restart statefulset cassandra, the pod cassandra-1 fails its readiness and liveness checks with Failed to connect to '127.0.0.1:7199' - ConnectException: 'Connection refused (Connection refused)'..

To Reproduce

Steps to reproduce the behavior:

Set up a local minikube cluster with 3 nodes:

minikube start -n 3

Run helm install cassandra bitnami/cassandra -f cassandra.yml using the following values:

persistence:
  size: 1Gi
  mountPath: /data
service:
  type: LoadBalancer
  clusterIP: 10.96.0.200
replicaCount: 2
podLabels:
  app: comp4000
  component: cassandra
extraEnvVars:
  - name: CASSANDRA_AUTHENTICATOR
    value: AllowAllAuthenticator
  - name: CASSANDRA_AUTHORIZER
    value: AllowAllAuthorizer

Observe that it works just fine:

kubectl exec -it cassandra-0 -- cqlsh
# and
kubectl exec -it cassandra-1 -- cqlsh

Then try to restart the statefulset:

kubectl rollout restart statefulset cassandra

Observe that it gets stuck with cassandra-1 restarting over and over. Use kubectl describe pod cassandra-1 to get more info. Looks like the following:

Name:         cassandra-1
Namespace:    default
Priority:     0
Node:         minikube/192.168.49.2
Start Time:   Sun, 21 Nov 2021 13:29:41 -0500
Labels:       app=comp4000
              app.kubernetes.io/instance=cassandra
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=cassandra
              component=cassandra
              controller-revision-hash=cassandra-7674c87d84
              helm.sh/chart=cassandra-9.0.4
              statefulset.kubernetes.io/pod-name=cassandra-1
Annotations:  kubectl.kubernetes.io/restartedAt: 2021-11-21T13:29:10-05:00
Status:       Running
IP:           10.244.0.12
IPs:
  IP:           10.244.0.12
Controlled By:  StatefulSet/cassandra
Containers:
  cassandra:
    Container ID:  docker://da668c3f4f7fc02f5e2afd3bf6100c7fe5290fac5ab37f32aa3389e9ea5aca30
    Image:         docker.io/bitnami/cassandra:4.0.1-debian-10-r48
    Image ID:      docker-pullable://bitnami/cassandra@sha256:c35c8cd79dac48eb8f190d1f4fa1d38d4b5d691c9e498c43d35d1aeae163744e
    Ports:         7000/TCP, 7001/TCP, 7199/TCP, 9042/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP
    Command:
      bash
      -ec
      # Node 0 is the password seeder
      if [[ $POD_NAME =~ (.*)-0$ ]]; then
          echo "Setting node as password seeder"
          export CASSANDRA_PASSWORD_SEEDER=yes
      else
          # Only node 0 will execute the startup initdb scripts
          export CASSANDRA_IGNORE_INITDB_SCRIPTS=1
      fi
      /opt/bitnami/scripts/cassandra/entrypoint.sh /opt/bitnami/scripts/cassandra/run.sh

    State:          Running
      Started:      Sun, 21 Nov 2021 13:55:12 -0500
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Sun, 21 Nov 2021 13:51:12 -0500
      Finished:     Sun, 21 Nov 2021 13:55:12 -0500
    Ready:          False
    Restart Count:  7
    Liveness:       exec [/bin/bash -ec nodetool status
] delay=60s timeout=5s period=30s #success=1 #failure=5
    Readiness:  exec [/bin/bash -ec nodetool status | grep -E "^UN\\s+${POD_IP}"
] delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:
      BITNAMI_DEBUG:                    false
      CASSANDRA_CLUSTER_NAME:           cassandra
      CASSANDRA_SEEDS:                  cassandra-0.cassandra-headless.default.svc.cluster.local
      CASSANDRA_PASSWORD:               <set to the key 'cassandra-password' in secret 'cassandra'>  Optional: false
      POD_IP:                            (v1:status.podIP)
      POD_NAME:                         cassandra-1 (v1:metadata.name)
      CASSANDRA_USER:                   cassandra
      CASSANDRA_NUM_TOKENS:             256
      CASSANDRA_DATACENTER:             dc1
      CASSANDRA_ENDPOINT_SNITCH:        SimpleSnitch
      CASSANDRA_KEYSTORE_LOCATION:      /opt/bitnami/cassandra/certs/keystore
      CASSANDRA_TRUSTSTORE_LOCATION:    /opt/bitnami/cassandra/certs/truststore
      CASSANDRA_RACK:                   rack1
      CASSANDRA_TRANSPORT_PORT_NUMBER:  7000
      CASSANDRA_JMX_PORT_NUMBER:        7199
      CASSANDRA_CQL_PORT_NUMBER:        9042
      CASSANDRA_AUTHENTICATOR:          AllowAllAuthenticator
      CASSANDRA_AUTHORIZER:             AllowAllAuthorizer
    Mounts:
      /data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jl7sf (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-cassandra-1
    ReadOnly:   false
  kube-api-access-jl7sf:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  27m                    default-scheduler  Successfully assigned default/cassandra-1 to minikube
  Normal   Pulled     27m                    kubelet            Container image "docker.io/bitnami/cassandra:4.0.1-debian-10-r48" already present on machine
  Normal   Created    27m                    kubelet            Created container cassandra
  Normal   Started    27m                    kubelet            Started container cassandra
  Warning  Unhealthy  24m (x5 over 26m)      kubelet            Liveness probe failed: nodetool: Failed to connect to '127.0.0.1:7199' - ConnectException: 'Connection refused (Connection refused)'.
  Normal   Killing    24m                    kubelet            Container cassandra failed liveness probe, will be restarted
  Warning  Unhealthy  2m33s (x123 over 26m)  kubelet            Readiness probe failed: nodetool: Failed to connect to '127.0.0.1:7199' - ConnectException: 'Connection refused (Connection refused)'.

Expected behavior

It should restart both pods in the statefulset without issues.

Version of Helm and Kubernetes:

version.BuildInfo{Version:"v3.7.1", GitCommit:"1d11fcb5d3f3bf00dbe6fe31b8412839a96b3dc4", GitTreeState:"clean", GoVersion:"go1.17.1"}
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"b695d79d4f967c403a96986f1750a35eb75e75f1", GitTreeState:"archive", BuildDate:"2021-11-18T21:20:32Z", GoVersion:"go1.17.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.3", GitCommit:"c92036820499fedefec0f847e2054d824aea6cd1", GitTreeState:"clean", BuildDate:"2021-10-27T18:35:25Z", GoVersion:"go1.16.9", Compiler:"gc", Platform:"linux/amd64"}
juan131 commented 2 years ago

Hi @willfindlay

Why did you change the data mountpath to /data? Are you using a custom image? Please note that Bitnami Cassandra image expects the data to be persisted at /bitnami/cassandra and data persistence won't work if you switch the mountpath.

willfindlay commented 2 years ago

Hi @willfindlay

Why did you change the data mountpath to /data? Are you using a custom image?

I was getting a permission error and changing the mount point seemed to fix it.

Please note that Bitnami Cassandra image expects the data to be persisted at /bitnami/cassandra and data persistence won't work if you switch the mountpath.

I didn't know that, thanks for the tip. Do you think this could actually be the reason the second pod fails to start up? Perhaps it's getting hung up on the step where it tries to mount the PVC?

willfindlay commented 2 years ago

Here's the specific error I'm getting without the custom pathname:

cassandra 21:46:42.05
cassandra 21:46:42.06 Welcome to the Bitnami cassandra container
cassandra 21:46:42.06 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-cassandra
cassandra 21:46:42.06 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-cassandra/issues
cassandra 21:46:42.06
cassandra 21:46:42.06 INFO  ==> ** Starting Cassandra setup **
cassandra 21:46:42.10 INFO  ==> Validating settings in CASSANDRA_* env vars..
cassandra 21:46:42.15 INFO  ==> Initializing Cassandra database...
mkdir: cannot create directory '/bitnami/cassandra/data': Permission denied
willfindlay commented 2 years ago

Ah, I was able to fix the permission error by setting volumePermissions.enabled. I'll double-check to see if that also resolves my original issue.

juan131 commented 2 years ago

great @willfindlay !! Please keep us updated.

The volumePermissions.enabled solution is perfect for StorageClasses that are not compatible with adapting the ownership of the filesystem based on the POD SecurityContext.

github-actions[bot] commented 2 years ago

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

willfindlay commented 2 years ago

Oops, forgot to circle back here. Making those changes did indeed resolve my issue.