Closed etki closed 1 year ago
Thanks very much for your interest in Elasticsearch.
This appears to be a user question, and we'd like to direct these kinds of things to the Elasticsearch forum. If you can stop by there, we'd appreciate it. This allows us to use GitHub for verified bug reports, feature requests, and pull requests.
There's an active community in the forum that should be able to help get an answer to your question. As such, I hope you don't mind that I close this.
Description
There are a lot of docs about automatic cluster changes, but they all go by this:
Docs provide basically no clarification on how the end user should understand when it is safe to proceed. It also can't be assumed that it is enough just to wait for some kind of timeout, for example, as in the naive scenario end user removes a node and expects it to be pulled out of the configuration automatically - but if there is any trouble with the master election exactly at that moment, or just a tight GC loop on masters because of the memory configuration, or another kind of disaster, then the actual removal of the node from cluster will be delayed to the moment cluster has reformed again (and pulling out a node is not necessarily a thing happening in a healthy environment, it may be a part of disaster recovery - if you need a precise example, imagine that ES was deployed on VMs that became unhealthy by themselves, and the end user needs to recycle all the masters one by one to spin up fresh unaffected VMs). I assume that it's possible to watch the cluster configuration, but that requires some toolset to do it in the automated way and some insight to do it manually, and both ways are also opaque from just looking at the corresponding documentation.
So this is a request for more thorough explanation of the processes under the hood and clarification on how user can detect that the automatic cluster state change has kicked in and safely proceed.