elastic / cloud-on-k8s

Elastic Cloud on Kubernetes
Other
2.58k stars 702 forks source link

Non-Graceful Cluster Rollout on `Version` Change on ECK 2.10 #7981

Closed SiorMeir closed 2 months ago

SiorMeir commented 2 months ago

Non-Graceful Cluster Rollout on Version Change on ECK 2.10

What did you do? Upgraded the version of Elasticsearch (& Kibana) from 8.12.2 to 8.14.1

What did you expect to see? Graceful rollout of cluster, where nodes are replaced one at a time.

What did you see instead? Under which circumstances? All nodes went down at the same time, Operator changed status to UNKNOWN, resulting in downtime.

However, Manually changing the image of the different node types, adding environment variables or settings caused a graceful rollout of the cluster.

We have ruled out the possibility of non-HA cluster behavior, as the cluster is around 30 nodes. In Addition, the cluster has designated master (around 3) to create orchestrating redundancy.

Environment

SiorMeir commented 2 months ago

@pebrc As you requested, I re-opened the issue as the cluster is more than 3 nodes.

This behavior was observed on multiple clusters & multiple k8s clusters (representing dev, staging etc..)

pebrc commented 2 months ago

You opened a new issue instead of re-opening the existing one.