Container couchdb is going into restart loop right after deploy without any logs

ihatemodels commented 1 year ago

Describe the bug

Upon executing helm install couchdb -n couchdb couchdb/couchdb -f values.yaml, the main container enters a continuous restart loop, lacking explanatory logs. This issue surfaces when persistence is enabled; without it, the container starts successfully. The PVC and PV are properly created, mounted and writable ( i tested from another container ).

Experimenting with a custom Deployment resulted in same behaviour. Consequently, the issue could originate from my storage configuration or permissions and how the docker container or the software expects them. It's noteworthy that other applications (Prometheus, RabbitMQ) operate without issues on the same storage. cluster, helm.

Any information or further steps will be appreciated. Thank you!

Version of Helm and Kubernetes:

Kubernetes

Provider: Amazon EKS, Kubernetes Version: v1.24.13 -0a21954

Helm:

version.BuildInfo{Version:"v3.9.4", GitCommit:"dbc6d8e20fe1d58d50e6ed30f09a04a77e4c68db", GitTreeState:"clean", GoVersion:"go1.17.13"}

StorageClass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: efs-storage-class-wa
allowVolumeExpansion: true
parameters:
  basePath: /dynamic_provisioning
  directoryPerms: '700'
  fileSystemId: <fs>
  gidRangeEnd: '2000'
  gidRangeStart: '1000'
  provisioningMode: efs-ap
provisioner: efs.csi.aws.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

What happened:

The StatefulSet is unable to start with Amzon EFS Persistence Storage

How to reproduce it (as minimally and precisely as possible):

Create EFS Storage on EKS and deploy following the guide in the README.

Anything else we need to know:

values.yaml

# -- the initial number of nodes in the CouchDB cluster.
clusterSize: 1
persistentVolume:
  enabled: true
  storageClass: "efs-storage-class-wa"
  accessModes:
    - ReadWriteOnce
  size: 10Gi
networkPolicy:
  enabled: false
image:
  tag: 3.3.2
dns:
  clusterDomainSuffix: cluster.local
service:
  enabled: true
prometheusPort:
  enabled: true
  bind_address: "0.0.0.0"
  port: 8080
couchdbConfig:
  chttpd:
    bind_address: any
    require_valid_user: false
  couchdb:
    uuid: 4714aa87edb4be946671309fbec8941a

kubectl describe pod couchdb-couchdb-0 -n couchdb-qa

Name:         couchdb-qa-couchdb-0
Namespace:    couchdb-qa
Priority:     0
Node:         ip-10-152-181-13.eu-west-1.compute.internal/10.152.181.13
Start Time:   Wed, 07 Jun 2023 12:34:11 +0300
Labels:       app=couchdb
              controller-revision-hash=couchdb-qa-couchdb-b6c8db589
              release=couchdb-qa
              statefulset.kubernetes.io/pod-name=couchdb-qa-couchdb-0
Status:       Running
Controlled By:  StatefulSet/couchdb-qa-couchdb
Init Containers:
  init-copy:
    Container ID:  containerd://de3c35142624b77f0c8abcca439f5b436ac0a23666e88cf0a5274f00e6558ca8
    Image:         busybox:latest
    Image ID:      docker.io/library/busybox@sha256:560af6915bfc8d7630e50e212e08242d37b63bd5c1ccf9bd4acccf116e262d5b
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
      cp /tmp/chart.ini /default.d; cp /tmp/seedlist.ini /default.d; cp /tmp/prometheus.ini /default.d; ls -lrt /default.d;
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 07 Jun 2023 12:34:13 +0300
      Finished:     Wed, 07 Jun 2023 12:34:13 +0300
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /default.d from config-storage (rw)
      /tmp/ from config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-w5kzb (ro)
Containers:
  couchdb:
    Container ID:   containerd://ff97bde75fd9ce3ea58d962bb8aa8e35902af2584bea4ac16ba0317d60b35a1f
    Image:          couchdb:3.3.2
    Image ID:       docker.io/library/couchdb@sha256:efd8eefd6e849ac88a5418bd4e633002e9f665fd6b16c3eb431656984203cfec
    Ports:          5984/TCP, 4369/TCP, 9100/TCP, 8080/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP, 0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Wed, 07 Jun 2023 12:34:14 +0300
      Finished:     Wed, 07 Jun 2023 12:34:14 +0300
    Ready:          False
    Restart Count:  1
    Liveness:       http-get http://:5984/_up delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get http://:5984/_up delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      COUCHDB_USER:           <set to the key 'adminUsername' in secret 'couchdb-qa-couchdb'>     Optional: false
      COUCHDB_PASSWORD:       <set to the key 'adminPassword' in secret 'couchdb-qa-couchdb'>     Optional: false
      COUCHDB_SECRET:         <set to the key 'cookieAuthSecret' in secret 'couchdb-qa-couchdb'>  Optional: false
      COUCHDB_ERLANG_COOKIE:  <set to the key 'erlangCookie' in secret 'couchdb-qa-couchdb'>      Optional: false
      ERL_FLAGS:               -name couchdb  -setcookie XXXXXXXXXXX
    Mounts:
      /opt/couchdb/data from database-storage (rw)
      /opt/couchdb/etc/default.d from config-storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-w5kzb (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  database-storage:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  database-storage-couchdb-qa-couchdb-0
    ReadOnly:   false
  config-storage:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      couchdb-qa-couchdb
    Optional:  false

Events:
  Type     Reason     Age                     From               Message
  ----     ------     ----                    ----               -------
  Normal   Scheduled  4s                      default-scheduler  Successfully assigned couchdb-qa/couchdb-qa-couchdb-0 to node1
  Normal   Pulling    3s                      kubelet            Pulling image "busybox:latest"
  Normal   Pulled     2s                      kubelet            Successfully pulled image "busybox:latest" in 593.012622ms
  Normal   Created    2s                      kubelet            Created container init-copy
  Normal   Started    2s                      kubelet            Started container init-copy
  Normal   Created    1s (x2 over 2s)         kubelet            Created container couchdb
  Normal   Started    1s (x2 over 2s)         kubelet            Started container couchdb
  Warning  BackOff    <invalid> (x4 over 0s)  kubelet            Back-off restarting failed container

kubectl logs couchdb-qa-couchdb-0 -n couchdb-qa

Defaulted container "couchdb" out of: couchdb, init-copy (init)

kubectl logs couchdb-qa-couchdb-0 --container init-copy -n couchdb-qa

total 12
-rw-r--r--    1 root     root            98 Jun  7 09:34 seedlist.ini
-rw-r--r--    1 root     root            71 Jun  7 09:34 prometheus.ini
-rw-r--r--    1 root     root           106 Jun  7 09:34 chart.ini

connorM43 commented 1 year ago

We are also experiencing this same issue when trying to go to 3.3.2. We've been able to successfully go to 3.2.1 for the time being.

ihatemodels commented 1 year ago

We are also experiencing this same issue when trying to go to 3.3.2. We've been able to successfully go to 3.2.1 for the time being.

That is not helping in my case. I get the same behaviour with different versions. Even 2.X.X

lolszowy commented 1 year ago

I have a same problem described in here https://github.com/apache/couchdb-helm/issues/123 Installing version back to 3.2.1 did not solve the problem.

willholley commented 1 year ago

My guess is that this is a permissions issue. If you can reproduce in a test environment, I would see whether you can get the container running using a custom command e.g. update the deployment to set:

    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 30; done;" ]

and then exec into the container. The standard container entrypoint is defined at https://github.com/apache/couchdb-docker/blob/main/3.3.2/docker-entrypoint.sh, so you could try running that manually from a shell and see whether any commands fail.

lolszowy commented 1 year ago

I am not sure how to change an entrypoint of a docker image that is being deployed via helm chart. Values.yaml doesn't give me such a possibility...

willholley commented 1 year ago

@lolszowy I would just kubectl edit the deployment manifest directly after deploying with Helm.

ihatemodels commented 1 year ago

As the author of the issue I am sorry, but currently I don't have much time to invest in it. As soon as I can I will proceed with further testing too. I tested with different storage class ( Amazon EBS and Longhorn ) and it was working as expected.

lolszowy commented 1 year ago

problem is definitely with mounting pv. There is no problem with running CouchDB without PV. But even when I am trying to create a local pv it is crashing same way.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: couchdb-statefulset
spec:
  selector:
    matchLabels:
      app: couchdb
  serviceName: couchdb-service
  replicas: 1
  template:
    metadata:
      labels:
        app: couchdb
    spec:
      containers:
        - name: couchdb
          image: couchdb:3.3.1
          ports:
            - containerPort: 5984
          volumeMounts:
            - name: couchdb-data
              mountPath: /opt/couchdb/
      volumes:
        - name: couchdb-data
          persistentVolumeClaim:
            claimName: couchdb-pvc-local
---

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: couchdb-pvc-local
spec:
  storageClassName: local-storage
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: my-local-pv
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  storageClassName: local-storage
  local:
    path: /mnt 
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - ip-10-3-1-81.eu-west-1.compute.internal
                - ip-10-3-2-247.eu-west-1.compute.internal 
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer

k describe pod couchdb-statefulset-0

Events:
  Type     Reason            Age              From               Message
  ----     ------            ----             ----               -------
  Warning  FailedScheduling  8s               default-scheduler  0/3 nodes are available: persistentvolumeclaim "couchdb-pvc-local" not found. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod..
  Normal   Scheduled         5s               default-scheduler  Successfully assigned tpos-sync/couchdb-statefulset-0 to ip-10-3-2-247.eu-west-1.compute.internal
  Normal   Pulled            3s (x2 over 4s)  kubelet            Container image "couchdb:3.3.1" already present on machine
  Normal   Created           3s (x2 over 4s)  kubelet            Created container couchdb
  Normal   Started           3s (x2 over 4s)  kubelet            Started container couchdb
  Warning  BackOff           2s               kubelet            Back-off restarting failed container couchdb in pod couchdb-statefulset-0_tpos-sync(92a8c35d-c8e1-4dee-b745-a8f4be50c106)

While using Helm Chart I had that error Warning FailedScheduling 62s default-scheduler 0/3 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod..

apache / couchdb-helm