elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.95k stars 24.74k forks source link

Provide clarifications on voting configuration changes timeouts #99230

Closed etki closed 1 year ago

etki commented 1 year ago

Description

There are a lot of docs about automatic cluster changes, but they all go by this:

After a node has joined or left the cluster the elected master must issue a cluster-state update that adjusts the voting configuration to match, and this can take a short time to complete. It is important to wait for this adjustment to complete before removing more nodes from the cluster.

Docs provide basically no clarification on how the end user should understand when it is safe to proceed. It also can't be assumed that it is enough just to wait for some kind of timeout, for example, as in the naive scenario end user removes a node and expects it to be pulled out of the configuration automatically - but if there is any trouble with the master election exactly at that moment, or just a tight GC loop on masters because of the memory configuration, or another kind of disaster, then the actual removal of the node from cluster will be delayed to the moment cluster has reformed again (and pulling out a node is not necessarily a thing happening in a healthy environment, it may be a part of disaster recovery - if you need a precise example, imagine that ES was deployed on VMs that became unhealthy by themselves, and the end user needs to recycle all the masters one by one to spin up fresh unaffected VMs). I assume that it's possible to watch the cluster configuration, but that requires some toolset to do it in the automated way and some insight to do it manually, and both ways are also opaque from just looking at the corresponding documentation.

So this is a request for more thorough explanation of the processes under the hood and clarification on how user can detect that the automatic cluster state change has kicked in and safely proceed.

DaveCTurner commented 1 year ago

Thanks very much for your interest in Elasticsearch.

This appears to be a user question, and we'd like to direct these kinds of things to the Elasticsearch forum. If you can stop by there, we'd appreciate it. This allows us to use GitHub for verified bug reports, feature requests, and pull requests.

There's an active community in the forum that should be able to help get an answer to your question. As such, I hope you don't mind that I close this.