Closed melissajenner22 closed 3 years ago
I deleted 7.7.1, installed 7.8.1. I got same error.
Error: Readiness probe failed: Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )
I use the chart, https://github.com/elastic/helm-charts, version 7.8.1 to install elasticsearch into kubernetes cluster. helm version: 2.16.10 Kubernetes version: 1.16 (EKS)
I use default settings, except changed the number of replicas from 3 to 1, changed minimumMasterNodes from 2 to 1 due to resource limitations.
git diff
--- a/elasticsearch/values.yaml
+++ b/elasticsearch/values.yaml
-replicas: 3
-minimumMasterNodes: 2
+replicas: 1
+minimumMasterNodes: 1
helm install --name elasticsearch ./elasticsearch --namespace elk
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
elasticsearch-master-0 0/1 Running 0 8m22s
kubectl describe pod elasticsearch-master-0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 45s default-scheduler Successfully assigned elk/elasticsearch-master-0 to ip-11-111-1-111.us-west-2.compute.internal
Normal SuccessfulAttachVolume 42s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-71b0adf3-3731-4891-97ef-83f4a192a929"
Normal Pulled 35s kubelet, ip-11-111-1-111.us-west-2.compute.internal Container image "docker.elastic.co/elasticsearch/elasticsearch:7.8.1" already present on machine
Normal Created 35s kubelet, ip-11-111-1-111.us-west-2.compute.internal Created container configure-sysctl
Normal Started 35s kubelet, ip-11-111-1-111.us-west-2.compute.internal Started container configure-sysctl
Normal Pulled 35s kubelet, ip-11-111-1-111.us-west-2.compute.internal Container image "docker.elastic.co/elasticsearch/elasticsearch:7.8.1" already present on machine
Normal Created 35s kubelet, ip-11-111-1-111.us-west-2.compute.internal Created container elasticsearch
Normal Started 34s kubelet, ip-11-111-1-111.us-west-2.compute.internal Started container elasticsearch
Warning Unhealthy 5s (x2 over 17s) kubelet, ip-11-111-1-111.us-west-2.compute.internal Readiness probe failed: Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )
Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )
is this happening if you edited anything in the helm-charts/elasticsearch/values.yaml file
Instead of use local source code I cloned from github.com/elastic/helm-charts, I use the command below to install elasticsearch,
$ helm repo add elastic https://helm.elastic.co
"elastic" has been added to your repositories
$ kubectl create namespace elk
$ helm install --name elasticsearch --version 7.8.1 elastic/elasticsearch --namespace elk
$ kubectl describe pod elasticsearch-master-0
Warning Unhealthy 3m10s (x2 over 3m20s) kubelet, ip-10-117-56-142.us-west-2.compute.internal Readiness probe failed: Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )
Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )
I have encountered the same problem, is there any fix yet?
I'm experiencing the same.
This is my config that overrides some defaults
replicas: 1
persistence:
enabled: false
resources:
requests:
cpu: "2"
memory: "1Gi"
limits:
cpu: "2"
memory: "2Gi"
# Openshift overrides
# https://github.com/elastic/helm-charts/tree/master/elasticsearch/examples/openshift
securityContext:
runAsUser: null
podSecurityContext:
fsGroup: null
runAsUser: null
sysctlInitContainer:
enabled: false
How is the cluster supposed to go green with these charts on initial standup?
The headless service doesn't resolve in DNS until at least one node is up, but the readiness check doesn't return ready until the cluster is green. I didn't try lowering the minimum master nodes, but even with that, it's unclear how it should properly start back up when none of the pods are currently up and running -- both during an initial deployment and in future if you for example needed to fully shut down the cluster and bring it back up with same pv's.
On a handcrafted deployment I have - I wound up disabling the readiness check until the cluster was operational, and then rolling out an update re-enabling it.
Maybe I'm missing something and it's the same thing affecting the deployment in this issue?
If this is completely unrelated to this issue, please disregard, just seemed to have a likely overlap.
If your running a single replica cluster add the following helm value:
clusterHealthCheckParams: "wait_for_status=yellow&timeout=1s"
Your status will never go green with a single replica cluster.
The following values should work:
replicas: 1
minimumMasterNodes: 1
clusterHealthCheckParams: 'wait_for_status=yellow&timeout=1s'
We have faced with the same issue in case of StatefulSet restoring with the same PVC(using it for test envs). It works ok if we crate elastic pod from scratch, but not after StatefulSet restoration. In couple with error Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" ) we get next error:
{"type": "server", "timestamp": "2020-10-01T14:09:37,285Z", "level": "INFO", "component": "o.e.c.r.a.AllocationService", "cluster.name": "**", "node.name": "**-0", "message": "Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[**_**][0]]]).", "cluster.uuid": "**", "node.id": "****" }
The issue seems to be not in the elasticsearch pods - assume it's in the index itself. Even if we create index from the app with number_of_replicas parameter set to 0, after elasticsearch pod recreation with the same pvc we get this value set to 1. This command helps us to get elastic back to life:
curl -H 'Content-Type: application/json' -XPUT 'elastic_host:9200/index_name/_settings' -d '{ "index":{"number_of_replicas" : 0 }}
May be there are some default value for numbers of replicas, which applies in case of elasticsearch restart?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I'm having the same issue running it on AWS Fargate. The fact that we can not run it on a privileged mode, I had to disable these settings from values.yml:
sysctlInitContainer:
enabled: false
But I end up with Max file descriptors being too low:
[2]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
Any suggestions to run it on Fargate ?
I was having the same issue. As our developers are coming from docker-compose I want to make the transition as smooth as possible so enabled the discovery.type=single-node
option by using a singleNode: true
property in the values.yaml.
https://github.com/elastic/helm-charts/pull/1027
Please test this is working for you as well and let me know any feedback. I've not written python in quite sometime!
I am also running into the same issue. Though in my scenario I am using the multi approach used in their examples, and replica's of 2 with ES version 6.1.4. I tried updating the clusterHealthCheckParams: "wait_for_status=[yellow,green]&timeout=200s"
with no avail. Nothing else coming out of the events or logs which raise any concern.
I was having the same issue. Earlier I have set a password with a length of fewer than 20 characters. but after setting a password with 20 character length in 100s pod status is turn in ready.
I was having the same issue. Earlier I have set a password with a length of fewer than 20 characters. but after setting a password with 20 character length in 100s pod status is turn in ready.
I can confirm adding a password completes the initialization. I didn't know of the 20 character limit and only had it by coincidence- good to know.
Thank you, this could be a comment for the variable in the values.yaml
Could not reproduce the initial issue from @melissajenner22 on some EKS cluster with same config (except using 7.13.2 version for chart/Elasticsearch).
There seem to have a lot of different issues related to deploying single node cluster.
Note that using replicas: 1
value should be the only value required to deploy a single node cluster.
Most issues where cluster status never get green with replicas: 1
are because a cluster with more nodes was deployed previously using the same PVC.
Indeed helm delete
doesn't delete the PVC, so if you have deployed a default 3-nodes cluster, then deleted it and redeployed a single-node cluster with the same name, it will reuse the disk of the first node of the 3-node cluster and will expect to have 2 other nodes using the other disks.
For tests clusters, if you don't need to keep datas, the best solution is to check the PVC (kubectl get pvc
) and remove those that match Elasticsearch PVC (elasticsearch-master-elasticsearch-master-0
, elasticsearch-master-elasticsearch-master-1
and elasticsearch-master-elasticsearch-master-2
with default config) to ensure they are not reused by the new single node cluster.
You can also name your single-node cluster differently (using cluster.name
for example) to ensure that the existing PVC will not be reused and new PVC will be created without having to delete the old one.
Finally if you need to keep the previous data while moving from a 3 node cluster to a single node cluster, you may need to start your cluster with the 3 nodes, then update all indices to have 0 replicas and migrate them to the first node before restarting with replicas: 1
.
I'm closing this issue as I can't reproduce the original error and I think that https://github.com/elastic/helm-charts/issues/783#issuecomment-874316380 should solve most of the other error.
If you still have Readiness probe failed: Waiting for elasticsearch cluster to become ready
errors with single node cluster not related to some previously existing PVC, please reopen a new bug report with all the details of your environments.
If your running a single replica cluster add the following helm value:
clusterHealthCheckParams: "wait_for_status=yellow&timeout=1s"
Your status will never go green with a single replica cluster.
The following values should work:
replicas: 1 minimumMasterNodes: 1 clusterHealthCheckParams: 'wait_for_status=yellow&timeout=1s'
This works for! Thanks! Just make sure to replace clusterHealthCheckParams
setting, instead of copy & paste from the post, otherwise it will be overwritten by the original setting in the yaml file
Try changing permission settings on the persistetvolume folder to 777. I was able to reproduce this with:
ElasticsearchException[failed to bind service]; nested: AccessDeniedException[/usr/share/elasticsearch/data/nodes];
Chart version: 7.7.1 Kubernetes version: 1.16 Kubernetes provider: E.g. GKE (Google Kubernetes Engine) EKS Helm Version: 2.16.10
helm get release
outpute.g.
helm get elasticsearch
(replaceelasticsearch
with the name of your helm release)Be careful to obfuscate every secrets (credentials, token, public IP, ...) that could be visible in the output before copy-pasting.
If you find some secrets in plain text in
helm get release
output you should use Kubernetes Secrets to managed them is a secure way (see Security Example).Output of helm get release
``` $ helm get elasticsearch REVISION: 1 RELEASED: Fri Aug 14 14:15:21 2020 CHART: elasticsearch-7.7.1 USER-SUPPLIED VALUES: {} COMPUTED VALUES: antiAffinity: hard antiAffinityTopologyKey: kubernetes.io/hostname clusterHealthCheckParams: wait_for_status=green&timeout=1s clusterName: elasticsearch envFrom: [] esConfig: {} esJavaOpts: -Xmx1g -Xms1g esMajorVersion: "" extraContainers: [] extraEnvs: [] extraInitContainers: [] extraVolumeMounts: [] extraVolumes: [] fsGroup: "" fullnameOverride: "" httpPort: 9200 image: docker.elastic.co/elasticsearch/elasticsearch imagePullPolicy: IfNotPresent imagePullSecrets: [] imageTag: 7.7.1 ingress: annotations: {} enabled: false hosts: - chart-example.local path: / tls: [] initResources: {} keystore: [] labels: {} lifecycle: {} masterService: "" masterTerminationFix: false maxUnavailable: 1 minimumMasterNodes: 1 nameOverride: "" networkHost: 0.0.0.0 nodeAffinity: {} nodeGroup: master nodeSelector: {} persistence: annotations: {} enabled: true podAnnotations: {} podManagementPolicy: Parallel podSecurityContext: fsGroup: 1000 runAsUser: 1000 podSecurityPolicy: create: false name: "" spec: fsGroup: rule: RunAsAny privileged: true runAsUser: rule: RunAsAny seLinux: rule: RunAsAny supplementalGroups: rule: RunAsAny volumes: - secret - configMap - persistentVolumeClaim priorityClassName: "" protocol: http rbac: create: false serviceAccountName: "" readinessProbe: failureThreshold: 3 initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 3 timeoutSeconds: 5 replicas: 1 resources: limits: cpu: 1000m memory: 2Gi requests: cpu: 1000m memory: 2Gi roles: data: "true" ingest: "true" master: "true" schedulerName: "" secretMounts: [] securityContext: capabilities: drop: - ALL runAsNonRoot: true runAsUser: 1000 service: annotations: {} httpPortName: http labels: {} labelsHeadless: {} loadBalancerIP: "" loadBalancerSourceRanges: [] nodePort: "" transportPortName: transport type: ClusterIP sidecarResources: {} sysctlInitContainer: enabled: true sysctlVmMaxMapCount: 262144 terminationGracePeriod: 120 tolerations: [] transportPort: 9300 updateStrategy: RollingUpdate volumeClaimTemplate: accessModes: - ReadWriteOnce resources: requests: storage: 30Gi HOOKS: --- # elasticsearch-lgztd-test apiVersion: v1 kind: Pod metadata: name: "elasticsearch-lgztd-test" annotations: "helm.sh/hook": test-success spec: securityContext: fsGroup: 1000 runAsUser: 1000 containers: - name: "elasticsearch-ctvif-test" image: "docker.elastic.co/elasticsearch/elasticsearch:7.7.1" command: - "sh" - "-c" - | #!/usr/bin/env bash -e curl -XGET --fail 'elasticsearch-master:9200/_cluster/health?wait_for_status=green&timeout=1s' restartPolicy: Never MANIFEST: --- # Source: elasticsearch/templates/poddisruptionbudget.yaml apiVersion: policy/v1beta1 kind: PodDisruptionBudget metadata: name: "elasticsearch-master-pdb" spec: maxUnavailable: 1 selector: matchLabels: app: "elasticsearch-master" --- # Source: elasticsearch/templates/service.yaml kind: Service apiVersion: v1 metadata: name: elasticsearch-master labels: heritage: "Tiller" release: "elasticsearch" chart: "elasticsearch" app: "elasticsearch-master" annotations: {} spec: type: ClusterIP selector: heritage: "Tiller" release: "elasticsearch" chart: "elasticsearch" app: "elasticsearch-master" ports: - name: http protocol: TCP port: 9200 - name: transport protocol: TCP port: 9300 --- # Source: elasticsearch/templates/service.yaml kind: Service apiVersion: v1 metadata: name: elasticsearch-master-headless labels: heritage: "Tiller" release: "elasticsearch" chart: "elasticsearch" app: "elasticsearch-master" annotations: service.alpha.kubernetes.io/tolerate-unready-endpoints: "true" spec: clusterIP: None # This is needed for statefulset hostnames like elasticsearch-0 to resolve # Create endpoints also if the related pod isn't ready publishNotReadyAddresses: true selector: app: "elasticsearch-master" ports: - name: http port: 9200 - name: transport port: 9300 --- # Source: elasticsearch/templates/statefulset.yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: elasticsearch-master labels: heritage: "Tiller" release: "elasticsearch" chart: "elasticsearch" app: "elasticsearch-master" annotations: esMajorVersion: "7" spec: serviceName: elasticsearch-master-headless selector: matchLabels: app: "elasticsearch-master" replicas: 1 podManagementPolicy: Parallel updateStrategy: type: RollingUpdate volumeClaimTemplates: - metadata: name: elasticsearch-master spec: accessModes: - ReadWriteOnce resources: requests: storage: 30Gi template: metadata: name: "elasticsearch-master" labels: heritage: "Tiller" release: "elasticsearch" chart: "elasticsearch" app: "elasticsearch-master" annotations: spec: securityContext: fsGroup: 1000 runAsUser: 1000 affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - "elasticsearch-master" topologyKey: kubernetes.io/hostname terminationGracePeriodSeconds: 120 volumes: initContainers: - name: configure-sysctl securityContext: runAsUser: 0 privileged: true image: "docker.elastic.co/elasticsearch/elasticsearch:7.7.1" imagePullPolicy: "IfNotPresent" command: ["sysctl", "-w", "vm.max_map_count=262144"] resources: {} containers: - name: "elasticsearch" securityContext: capabilities: drop: - ALL runAsNonRoot: true runAsUser: 1000 image: "docker.elastic.co/elasticsearch/elasticsearch:7.7.1" imagePullPolicy: "IfNotPresent" readinessProbe: exec: command: - sh - -c - | #!/usr/bin/env bash -e # If the node is starting up wait for the cluster to be ready (request params: "wait_for_status=green&timeout=1s" ) # Once it has started only check that the node itself is responding START_FILE=/tmp/.es_start_file http () { local path="${1}" local args="${2}" set -- -XGET -s if [ "$args" != "" ]; then set -- "$@" $args fi if [ -n "${ELASTIC_USERNAME}" ] && [ -n "${ELASTIC_PASSWORD}" ]; then set -- "$@" -u "${ELASTIC_USERNAME}:${ELASTIC_PASSWORD}" fi curl --output /dev/null -k "$@" "http://127.0.0.1:9200${path}" } if [ -f "${START_FILE}" ]; then echo 'Elasticsearch is already running, lets check the node is healthy' HTTP_CODE=$(http "/" "-w %{http_code}") RC=$? if [[ ${RC} -ne 0 ]]; then echo "curl --output /dev/null -k -XGET -s -w '%{http_code}' \${BASIC_AUTH} http://127.0.0.1:9200/ failed with RC ${RC}" exit ${RC} fi # ready if HTTP code 200, 503 is tolerable if ES version is 6.x if [[ ${HTTP_CODE} == "200" ]]; then exit 0 elif [[ ${HTTP_CODE} == "503" && "7" == "6" ]]; then exit 0 else echo "curl --output /dev/null -k -XGET -s -w '%{http_code}' \${BASIC_AUTH} http://127.0.0.1:9200/ failed with HTTP code ${HTTP_CODE}" exit 1 fi else echo 'Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )' if http "/_cluster/health?wait_for_status=green&timeout=1s" "--fail" ; then touch ${START_FILE} exit 0 else echo 'Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )' exit 1 fi fi failureThreshold: 3 initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 3 timeoutSeconds: 5 ports: - name: http containerPort: 9200 - name: transport containerPort: 9300 resources: limits: cpu: 1000m memory: 2Gi requests: cpu: 1000m memory: 2Gi env: - name: node.name valueFrom: fieldRef: fieldPath: metadata.name - name: cluster.initial_master_nodes value: "elasticsearch-master-0," - name: discovery.seed_hosts value: "elasticsearch-master-headless" - name: cluster.name value: "elasticsearch" - name: network.host value: "0.0.0.0" - name: ES_JAVA_OPTS value: "-Xmx1g -Xms1g" - name: node.data value: "true" - name: node.ingest value: "true" - name: node.master value: "true" volumeMounts: - name: "elasticsearch-master" mountPath: /usr/share/elasticsearch/data ```Describe the bug: