elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.12k stars 24.83k forks source link

minimum_master_nodes does not prevent split-brain if splits are intersecting #2488

Closed saj closed 10 years ago

saj commented 11 years ago

G'day,

I'm using ElasticSearch 0.19.11 with the unicast Zen discovery protocol.

With this setup, I can easily split a 3-node cluster into two 'hemispheres' (continuing with the brain metaphor) with one node acting as a participant in both hemispheres. I believe this to be a significant problem, because now minimum_master_nodes is incapable of preventing certain split-brain scenarios.

Here's what my 3-node test cluster looked like before I broke it:

Here's what the cluster looked like after simulating a communications failure between nodes (2) and (3):

Here's what seems to have happened immediately after the split:

  1. Node (2) and (3) lose contact with one another. (zen-disco-node_failed ... reason failed to ping)
  2. Node (2), still master of the left hemisphere, notes the disappearance of node (3) and broadcasts an advisory message to all of its followers. Node (1) takes note of the advisory.
  3. Node (3) has now lost contact with its old master and decides to hold an election. It declares itself winner of the election. On declaring itself, it assumes master role of the right hemisphere, then broadcasts an advisory message to all of its followers. Node (1) takes note of this advisory, too.

At this point, I can't say I know what to expect to find on node (1). If I query both masters for a list of nodes, I see node (1) in both clusters.

Let's look at minimum_master_nodes as it applies to this test cluster. Assume I had set minimum_master_nodes to 2. Had node (3) been completely isolated from nodes (1) and (2), I would not have run into this problem. The left hemisphere would have enough nodes to satisfy the constraint; the right hemisphere would not. This would continue to work for larger clusters (with an appropriately larger value for minimum_master_nodes).

The problem with minimum_master_nodes is that it does not work when the split brains are intersecting, as in my example above. Even on a larger cluster of, say, 7 nodes with minimum_master_nodes set to 4, all that needs to happen is for the 'right' two nodes to lose contact with one another (a master election has to take place) for the cluster to split.

Is there anything that can be done to detect the intersecting split on node (1)?

Would #1057 help?

Am I missing something obvious? :)

speedplane commented 8 years ago

What evidence do you have that they were simultaneously acting as master?

In the Big Desk plugin, the little star next to node name kept on bouncing back and forth between my two nodes (see screenshot).

bigdesk plugin

How do you know that they were in communication with each other?

I don't think I explicitly tested whether one could contact the other, but I was able to ssh into both, they were on the same network, and there did not appear to be any network issues.

What version of Elasticsearch?

1.7.3

jasontedor commented 8 years ago

In the Big Desk plugin, the little star next to node name kept on bouncing back and forth between my two nodes (see screenshot).

@speedplane I'm not familiar with the Big Desk plugin, sorry. Let's just assume that it's correct and as you say. Have you checked the logs or any other monitoring for repeated long-running garbage collections pauses on both of these nodes?

I don't think I explicitly tested whether one could contact the other, but I was able to ssh into both, they were on the same network, and there did not appear to be any network issues.

Networks are fickle things but I do suspect something else here.

1.7.3

Thanks.

XANi commented 8 years ago

@speedplane "2 node situation" is inherently hard to deal with because there is no one metric iy could be decided which one should be shot down.

"Most written to" or "last written to" doesnt really mean much and in most cases alerting that something is wrong is preferable to "just throw away whatever other node had".

That is why a lot of distributed software recommends at least 3 nodes, because with 3 there is always majority, so you can set it up to only allow requests if at least n/2+1 nodes are up