elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.95k stars 24.74k forks source link

Setting discovery.zen.minimum_master_nodes too high via REST can cause unstartable cluster #24435

Closed centic9 closed 7 years ago

centic9 commented 7 years ago

Describe the feature:

Moved over here from https://discuss.elastic.co/t/setting-discovery-zen-minimum-master-nodes-via-rest-can-cause-unstartable-cluster/84003 upon request:

I have a small Elasticsearch cluster which is in a state where we can not start it any more.

Sequence of events:

I don't have node2 available any more at all.

So now when we try to restart node1, it does not start any more because master is not elected as it still thinks it needs to see 2 nodes. Naturally it can also not see that the persistent setting is wrong and should be replaced by the newer yml-setting.

But as this is a setting of the persistent cluster state, I cannot change it unless there is a master elected, but I cannot elect a master until the setting is changed which means I am in a deadlock!

It would be good to have a way other than adding a new node just for adjusting the cluster-state setting or having to remove the cluster-state file (not sure if this is a good idea?) to clear the cluster state this way.

Either the setting does not make sense as persistent setting at all or there should be a way to adjust this setting even if the cluster is not fully up due to missing master election.

clintongormley commented 7 years ago

Related to https://github.com/elastic/elasticsearch/issues/18573

javanna commented 7 years ago

Relates to #22108

bleskes commented 7 years ago

cluster-state file (not sure if this is a good idea?)

If you are 100% sure you only need your index data (no cluster level settings) you can do this, but it is a risky operation.

Either the setting does not make sense as persistent setting at all or there should be a way to adjust this setting even if the cluster is not fully up due to missing master election.

I agree that this is messy. The reason why it is a persistent setting now is that it protects people when they update the min master nodes via the API and forget to update the yml files first.

We are at the process of re-evaluating/designing this area of code. I suggest we wait with trying to solve this until that redesign has happened. Any solution I can think of now is messy (and you always have the option to start a second node as a workaround now).

PS - transitioning from a two node cluster to a one cluster in a safe way is a tricky thing on it's own. 2 is just a hard and unforgiving number when it comes to distributed systems

bleskes commented 7 years ago

Closing as it seems the discussion has died. Bottom line, when publishing a cluster state with a given min master node, it's unsafe to reform a cluster with less than that number of nodes. We may have a future tool to acknowledge it and allow starting the cluster anyway but for now there are bigger fish to fry.