elastic / helm-charts

You know, for Kubernetes
Apache License 2.0
1.88k stars 1.93k forks source link

[elasticsearch] New readiness probes causing full cluster-restart #631

Closed ckotzbauer closed 4 years ago

ckotzbauer commented 4 years ago

Chart version: 7.7.0

Kubernetes version: 1.17.1

Kubernetes provider: On-prem

Helm Version: 3.2.0

helm get release output

Output of helm get release ``` USER VALUES esConfig: elasticsearch.yml: | action.auto_create_index: "-hd-*" esJavaOpts: -Xmx5g -Xms5g esMajorVersion: 6 image: "private-image-based-on-official-oss:6.4.2" imagePullPolicy: IfNotPresent imagePullSecrets: - name: some-credentials imageTag: 6.4.2 ingress: annotations: ingress.kubernetes.io/ssl-redirect: "true" kubernetes.io/ingress.class: nginx nginx.ingress.kubernetes.io/auth-realm: Authentication Required nginx.ingress.kubernetes.io/auth-secret: dev-elasticsearch-ingress-auth nginx.ingress.kubernetes.io/auth-type: basic nginx.ingress.kubernetes.io/proxy-body-size: 60m enabled: true hosts: - "" path: / tls: - hosts: - "" lifecycle: postStart: exec: command: - bash - -c - | #!/bin/bash cd /usr/share/elasticsearch/plugins/xxx /opt/jdk-10.0.2/bin/jar -cf config.jar config.cfg chmod 777 config.jar persistence: enabled: true podSecurityPolicy: create: false rbac: create: true resources: limits: cpu: 2000m memory: 8Gi requests: cpu: 200m memory: 8Gi sysctlInitContainer: enabled: false volumeClaimTemplate: accessModes: - ReadWriteOnce resources: requests: storage: 100Gi storageClassName: rook-ceph-cephfs ```

Describe the bug: I updated the chart to the newest version 7.7.0 and expected that the three-elastic nodes are updated one after another, waiting until the cluster is green again. (The most recent restarted pod was not ready, until the cluster was green again in the past). Now, the pod became ready after a few minutes and kubernetes moved on too quickly, so the cluster was red and not down.

Steps to reproduce:

  1. Update the chart-installation to 7.7.0

Expected behavior: The readiness probe is working as expected, and mark the pod as not ready, until the cluster is green again.

Any additional context: I did the update of my release often in the past without such problems, but always today with the new version.

fatmcgav commented 4 years ago

So I suspect this behaviour is related to the change to the readinessProbe as part of #586 .

Would you be able to provide the output of kubectl get events and kubectl describe <elasticsearch master pod>.

fatmcgav commented 4 years ago

Also, as a general point of interest, if you're pinning to a custom 6.4.2 image, what do you aim to accomplish by bumping the chart version?

As the only real reason to bump the chart version is to pick up changes related to new versions and bug fixes.

ckotzbauer commented 4 years ago

Yes, I also think that #586 is the reason for this.

The events are not visible anymore and I can't restart the cluster right now. If you really need them I can do it today in the evening...

I did not explicit upgrade the chart, because there was an update. But I made changes to the chart-settings yesterday and therefore implicit upgraded the chart to the newest version, as I'm not pinning the chart-version if it's not really needed.

kubectl describe masterpod ``` Name: elasticsearch-master-0 Namespace: dev-elasticsearch Priority: 0 Node: managed-k8s-worker-5/xx.xx.xx.xx Start Time: Mon, 18 May 2020 15:02:22 +0200 Labels: app=elasticsearch-master chart=elasticsearch controller-revision-hash=elasticsearch-master-648f485fbf heritage=Helm release=dev-elasticsearch statefulset.kubernetes.io/pod-name=elasticsearch-master-0 Annotations: backup.velero.io/backup-volumes: dev-backup configchecksum: 72b33fd5cf47279b81ec67f0bc0ccadfb3cc9eef20d3a88ecbf5274cc1a58f5 kubernetes.io/psp: sag-default-psp Status: Running IP: xx.xx.xx.xx Controlled By: StatefulSet/elasticsearch-master Containers: elasticsearch: Container ID: docker://85abf100f7b3324dcadaac0fc50e8df8bf8bdc929e0e715932db3a0510d41344 Image: private-image-based-on-official-oss:6.4.2 Image ID: docker-pullable://xxxx Ports: 9200/TCP, 9300/TCP Host Ports: 0/TCP, 0/TCP State: Running Started: Mon, 18 May 2020 15:03:09 +0200 Ready: True Restart Count: 0 Limits: cpu: 2 memory: 8Gi Requests: cpu: 200m memory: 8Gi Readiness: exec [sh -c #!/usr/bin/env bash -e # If the node is starting up wait for the cluster to be ready (request params: 'wait_for_status=green&timeout=1s' ) # Once it has started only check that the node itself is responding START_FILE=/tmp/.es_start_file if [ -n "${ELASTIC_USERNAME}" ] && [ -n "${ELASTIC_PASSWORD}" ]; then BASIC_AUTH="-u ${ELASTIC_USERNAME}:${ELASTIC_PASSWORD}" else BASIC_AUTH='' fi if [ -f "${START_FILE}" ]; then echo 'Elasticsearch is already running, lets check the node is healthy' HTTP_CODE=$(curl -XGET -s -k ${BASIC_AUTH} -o /dev/null -w '%{http_code}' http://127.0.0.1:9200/) RC=$? if [[ ${RC} -ne 0 ]]; then echo "curl -XGET -s -k \${BASIC_AUTH} -o /dev/null -w '%{http_code}' http://127.0.0.1:9200/ failed with RC ${RC}" exit ${RC} fi # ready if HTTP code 200, 503 is tolerable if ES version is 6.x if [[ ${HTTP_CODE} == "200" ]]; then exit 0 elif [[ ${HTTP_CODE} == "503" && "6" == "6" ]]; then exit 0 else echo "curl -XGET -s -k \${BASIC_AUTH} -o /dev/null -w '%{http_code}' http://127.0.0.1:9200/ failed with HTTP code ${HTTP_CODE}" exit 1 fi else echo 'Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )' if curl -XGET -s -k --fail ${BASIC_AUTH} http://127.0.0.1:9200/_cluster/health?wait_for_status=green&timeout=1s ; then touch ${START_FILE} exit 0 else echo 'Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )' exit 1 fi fi ] delay=10s timeout=5s period=10s #success=3 #failure=3 Environment: node.name: elasticsearch-master-0 (v1:metadata.name) discovery.zen.minimum_master_nodes: 2 discovery.zen.ping.unicast.hosts: elasticsearch-master-headless cluster.name: elasticsearch network.host: 0.0.0.0 ES_JAVA_OPTS: -Xmx5g -Xms5g node.data: true node.ingest: true node.master: true Mounts: /usr/share/elasticsearch/backup from dev-backup (rw) /usr/share/elasticsearch/backup-script from dev-backup-script (ro) /usr/share/elasticsearch/config/elasticsearch.yml from esconfig (rw,path="elasticsearch.yml") /usr/share/elasticsearch/data from elasticsearch-master (rw) /var/run/secrets/kubernetes.io/serviceaccount from elasticsearch-master-token-xg869 (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: elasticsearch-master: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: elasticsearch-master-elasticsearch-master-0 ReadOnly: false esconfig: Type: ConfigMap (a volume populated by a ConfigMap) Name: elasticsearch-master-config Optional: false dev-backup: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: dev-elasticsearch-backup ReadOnly: false dev-backup-script: Type: ConfigMap (a volume populated by a ConfigMap) Name: dev-elasticsearch-backup Optional: false elasticsearch-master-token-xg869: Type: Secret (a volume populated by a Secret) SecretName: elasticsearch-master-token-xg869 Optional: false QoS Class: Burstable Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: ```
fatmcgav commented 4 years ago

The events are not visible anymore and I can't restart the cluster right now. If you really need them I can do it today in the evening...

Ok, I'll see if I can replicate locally, and I'll come back if need the events...

But I made changes to the chart-settings yesterday and therefore implicit upgraded the chart to the newest version, as I'm not pinning the chart-version if it's not really needed.

Ah, ok... So for now, I'd suggest pinning the chart version to a previous version so that any future config updates don't restart in a full restart.

ckotzbauer commented 4 years ago

Yes, that's the plan. If you need more infos, I will try to get them 😉

fatmcgav commented 4 years ago

@ckotzbauer OK, I've done a bit more testing locally, and I don't think the change in readinessProbe is necessarily the issue, as I was able to reproduce the loss of availability on the elasticsearch-master service by just deleting the current master pod.

That got me looking through the config and history, and I came across this issue: https://github.com/elastic/helm-charts/issues/63.

By setting masterTerminationFix: true, I didn't observe any service disruption when either deleting the master pod, or doing a helm upgrade to a new chart version.

Do you want to try applying that config item?

ckotzbauer commented 4 years ago

hm, okay that's an interesting point. I know this behavior too, that the election tooks too long and the cluster is down if only the master is deleted... I can try this.

But I (personally) don't think that this causes the problem, that K8s does the rollout too fast. The new pods are marked as ready, so K8s thinks it could go on. The intention of the readinessProbe implementation in the statefulset is to prevent K8s from doing this, by delaying the readiness state until the cluster is recovered again. And that seems the point which does not work anymore. Or did I miss sth. here...? :thinking:

fatmcgav commented 4 years ago

@ckotzbauer So a quick update from me.

We ran some more tests internally, and were able to reproduce the "too fast" rollout on an internal 7.7.0 cluster which has a non-zero amount of data.

After a bit of digging, it looks like there was a set of quotes missed on the initial curl command that included the &, which was effectively breaking the command out early...

I've opened #638 which re-works that behaviour, and I'm running some more tests internally...

We'll probably be looking to do a patch release for this pretty quickly.

However my recommendation to you as you're running a 6.4.2 image would be to pin to the 6.8.x version of the chart.

ckotzbauer commented 4 years ago

Thank you so much for digging in @fatmcgav. I really appreciate! This sounds promising. I will pin my chart to the 6.8.x version and wait for the patch-release. Thanks again, for your work!

fatmcgav commented 4 years ago

@ckotzbauer The fix in #638 has been merged, and back-ported to the 6.8 and 7.7 branches for inclusion on the next minor release.