DB185344 commented 3 years ago

Describe the bug

After restarting a pod, the node fails to join the cluster properly, and we're getting an error on Fauxton, that displays 'this database failed to load' on some databases. when refreshing the browser, a different db comes online and a different db displays 'this database failed to load'. only after running a curl request with 'finish_cluster' the error stops.

Version of Helm and Kubernetes: Helm: 3.5.4, Kubernetes: 1.19

What happened: After restarting a pod, the node fails to join the cluster properly, and only after running:

curl -X POST http://$adminUser:$adminPassword@:5984/_cluster_setup -H "Accept: application/json" -H "Content-Type: application/json" -d '{"action": "finish_cluster"}' The pod will join back to the cluster.

What you expected to happen: After restart of the pod, the node automatically joins the cluster.

How to reproduce it (as minimally and precisely as possible): restart 1 pod in the cluster.

Anything else we need to know:

Adding image from Fauxton regarding this database failed to load:

Also added the values.yaml:

clusterSize: 3

allowAdminParty: false

createAdminSecret: false

adminUsername: admin networkPolicy: enabled: true

serviceAccount: enabled: true create: true persistentVolume: enabled: true accessModes:

ReadWriteOnce size: 10Gi storageClass: "ssd-couchdb"

image: repository: tag: latest pullPolicy: Always

searchImage: repository: kocolosk/couchdb-search tag: 0.2.0 pullPolicy: IfNotPresent

enableSearch: false

initImage: repository: busybox tag: latest pullPolicy: Always

podManagementPolicy: Parallel

affinity: {}

annotations: {}

tolerations: []

service:

annotations:

enabled: true type: LoadBalancer externalPort: 5984 sidecarsPort: 8080 LoadBalancerIP:

ingress: enabled: false hosts:

chart-example.local path: / annotations: [] tls: resources: {}

erlangFlags: name: couchdb setcookie: monster

couchdbConfig: chttpd: bind_address: any require_valid_user: false

dns: clusterDomainSuffix: cluster.local livenessProbe: enabled: true failureThreshold: 3 initialDelaySeconds: 0 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 readinessProbe: enabled: true failureThreshold: 3 initialDelaySeconds: 0 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1

sidecars: image: "" imagePullPolicy: Always

jftanner commented 2 years ago

Did you ever find a fix for the pod not rejoining the cluster properly? I'm encountering that now.

willholley commented 2 years ago

@jftanner can you share the logs from the pod that isn't joined? If the admin hash is not specified in the helm chart then you may be encountering https://github.com/apache/couchdb-helm/issues/7.

jftanner commented 2 years ago

Hi @willholley. It might be #7, but it doesn't happen on pod restart. It only happens when there's a new pod after a helmd upgrade. It seems to be that whenever the helm chart is run, it generates new credentials. (I noticed that the auto-generated admin password changes every time I install I update the helm deployment.) New pods pick up the new credential, but old ones don't. So the workaround I found was to kill all the existing pods after scaling. (Obviously not ideal, but I don't have to do that very often.)

Perhaps #89 will fix it?

Alternatively, I could just define my own admin credentials manually and not have a problem anymore.

colearendt commented 2 years ago

Yes, this sounds just like #78 , and #89 would likely fix / is intended to fix 😄

apache / couchdb-helm

Fauxton shows “This database failed to load” after pod restarts #52

annotations: