apache / couchdb-helm

Apache CouchDB Helm Chart
https://couchdb.apache.org/
Apache License 2.0
48 stars 65 forks source link

Couchdb pods perpetually crashing under OpenShift #13

Open blsaws opened 4 years ago

blsaws commented 4 years ago

Describe the bug Couchdb pods are continuously crashing under OpenShift.

Version of Helm and Kubernetes: Helm $ helm version Client: &version.Version{SemVer:"v2.12.3", GitCommit:"eecf22f77df5f65c823aacd2dbd30ae6c65f186e", GitTreeState:"clean"} Server: &version.Version{SemVer:"v2.12.3", GitCommit:"eecf22f77df5f65c823aacd2dbd30ae6c65f186e", GitTreeState:"clean"}

OpenShift $ oc version oc v3.11.0+0cbc58b kubernetes v1.11.0+d4cacc0 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://127.0.0.1:8443 kubernetes v1.11.0+d4cacc0

What happened: Deployed the couchdb helm chart, and the pods are continually crashing. Deployment commands: helm repo add couchdb https://apache.github.io/couchdb-helm helm install --name acumos-couchdb --namespace acumos \ --set service.type=NodePort --set allowAdminParty=true couchdb/couchdb

What you expected to happen: Couchdb pods should become ready. This happens as expected under generic kubernetes.

How to reproduce it (as minimally and precisely as possible): 1) Install OpenShift Origin 3.11 2) Setup other cluster/namespace prerequisites, e.g. create the namespace as used in the example above. 3) Install the CouchDB helm chart, as above

Anything else we need to know:

willholley commented 4 years ago

I don't think this chart has been tested under OpenShift; it's difficult to speculate on the cause of the problem without more detail from the pod logs.

That said, I'd recommend using the CouchDB Operator instead of the Helm chart for OpenShift / OKD deployments.

blsaws commented 4 years ago

Here are the logs from the init-copy containers (they are crashing), and output of describe pods: couchdb-openshift-crash.txt

My goal is where possible to use a consistent set of upstream tools to deploy supplemental components (e.g. mariadb, nexus, ELK, jupyterhub, NiFi, Jenkins, ...). This reduces the maintenance effort and UX variations across k8s envs. But I will take a look at the Operator. In the meantime if you have any suggestions on the reason for the crash I would appreciate it, since the logs really don't tell me anything.

willholley commented 4 years ago

@blsaws those logs look to be from the init-copy container which succeeded. Can you get the logs from the couchdb container: oc logs acumos-couchdb-couchdb-0 -c couchdb?

blsaws commented 4 years ago

Nothing is returned from the logs: root@77f48ec29783:/# oc logs acumos-couchdb-couchdb-0 -c couchdb root@77f48ec29783:/#

willholley commented 4 years ago

@blsaws you might need to use the --previous flag to get the logs of the crashed container. See https://kubernetes.io/docs/tasks/debug-application-cluster/debug-pod-replication-controller/#my-pod-is-crashing-or-otherwise-unhealthy. At the moment I don't have enough information to provide any guidance as to why it might be failing I'm afraid.

alwinmark commented 4 years ago

No its just silently failing exiting 1 as well on Rancher with PSPs enabled. Guess this Chart or the default Container does not work well without certain privileges or rights.

  - containerID: docker://41e114505ff6963276d07ae001be4cb4794e1b79532930c1aec8b51107304263
    image: couchdb:2.3.1
    imageID: docker-pullable://couchdb@sha256:da2d31cc06455d6fc12767c4947c6b58e97e8cda419ecbe054cc89ab48420afa
    lastState:
      terminated:
        containerID: docker://41e114505ff6963276d07ae001be4cb4794e1b79532930c1aec8b51107304263
        exitCode: 1
        finishedAt: 2020-01-30T12:09:42Z
        reason: Error
        startedAt: 2020-01-30T12:09:41Z
    name: couchdb
    ready: false
    restartCount: 2
    started: false
    state:
      waiting:
        message: back-off 20s restarting failed container=couchdb pod=couchdb-tischi-test-couchdb-0_connect(7af5e9ca-38b1-493b-9170-5a58da8c4b5c)
        reason: CrashLoopBackOff
  hostIP: 172.21.1.113
  initContainerStatuses:
  - containerID: docker://3be2b192ab8e92628082527f39aa7db417708c55fac2cb0cdf1823078a0e0988
    image: busybox:latest
    imageID: docker-pullable://busybox@sha256:6915be4043561d64e0ab0f8f098dc2ac48e077fe23f488ac24b665166898115a
    lastState: {}
    name: init-copy
    ready: true
    restartCount: 0
    state:
      terminated:
        containerID: docker://3be2b192ab8e92628082527f39aa7db417708c55fac2cb0cdf1823078a0e0988
        exitCode: 0
        finishedAt: 2020-01-30T12:09:29Z
        reason: Completed
        startedAt: 2020-01-30T12:09:29Z

Logs are empty even with --previous.

In order to reproduce, run K8s cluster with follwoing PSP:

apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  labels:
  name: restricted-psp
spec:
  allowPrivilegeEscalation: false
  fsGroup:
    ranges:
    - max: 65535
      min: 1
    rule: MustRunAs
  requiredDropCapabilities:
  - ALL
  runAsUser:
    rule: RunAsAny
  seLinux:
    rule: RunAsAny
  supplementalGroups:
    ranges:
    - max: 65535
      min: 1
    rule: MustRunAs
  volumes:
  - configMap
  - emptyDir
  - projected
  - secret
  - downwardAPI
  - persistentVolumeClaim

as it is default by Rancher and similar to OKD when enabling PSPs/SecurityContextClasses

bondar-pavel commented 4 years ago

Looks like I have same issue, pods can not be created because of psp:

$ sudo kubectl describe statefulset -n couchdb
...
Volume Claims:  <none>
Events:
  Type     Reason        Age                   From                    Message
  ----     ------        ----                  ----                    -------
  Warning  FailedCreate  8m23s (x19 over 30m)  statefulset-controller  create Pod vociferous-garfish-couchdb-0 in StatefulSet vociferous-garfish-couchdb failed error: pods "vociferous-garfish-couchdb-0" is forbidden: unable to validate against any pod security policy: []
bondar-pavel commented 4 years ago

PR #30 resolves my issues with pod security policies:

create Pod vociferous-garfish-couchdb-0 in StatefulSet vociferous-garfish-couchdb failed error: pods "vociferous-garfish-couchdb-0" is forbidden: unable to validate against any pod security policy: []

@blsaws Could you please check if it resolves your issue as well?

bondar-pavel commented 4 years ago

Looks like my issue is different from the original one, since in my case pods were not even created because they did not satisfy policies on the cluster.