colinmollenhour / mariadb-galera-swarm

MariaDb Galera Cluster container based on official mariadb image which can auto-bootstrap and recover cluster state.
https://hub.docker.com/r/colinmollenhour/mariadb-galera-swarm
Apache License 2.0
217 stars 102 forks source link

Galera startup scenarios #42

Closed smidge84 closed 5 years ago

smidge84 commented 6 years ago

Hi Guys,

This isn't really a bug or an issue, but I'm looking for help from people that might ave more user experience and knowledge about Galera, and specifically this Docker implementation. Firstly, I have to say that this project is really good and I've enjoyed working with it so far. I will have some contributions to make once I've cleared this up and tidied up my branch a bit.

I've been playing around with the recovery sequence of a 3 node Galera cluster. The scenario I'm investigating is how it recovers following a complete power cut to all three nodes.

I can't decide which would be the most appropriate recovery strategy in the following scenario: (I have used numbered nodes here just to help illustrate the scenario, but in reality it's arbitrary which nodes are in these states.)

I already have some prototype solutions so that only one of these conditions is executed as I feel that one node trying to create a new cluster and 2 other nodes trying to reform a previous cluster is not ideal. I'm struggling to decide which (because of my limited knowledge) is the correct decision that should win.

So in summary:

  1. Should a node (with lowest IP address) not part of the view information make a new cluster
  2. Should the 2 nodes reform the previous cluster (primary component) and the singled out node should attempt to join it?

Cheers for the help in advance. I don't mean for this to be too difficult.

After further investigation I have discovered that the easiest to implement is option 1, preferring the node with the highest sequence number and lowest IP, which may not necessarily be part of the 2 node view.

Option 2 I now think isn't possible because from the point of view of node 2 (in my example), based solely on the data it received during the state data exchange phase it cannot guarantee that the other nodes would choose to reform the primary component because it doesn't know how many members there should be. I came to this conclusion by extrapolating to a 4 node cluster (which you wouldn't normally do), where three nodes are pat of the view, but one of them doesn't turn back on after the power outage. Thus just because node 2 can see view state from 2 other nodes, in this situation they wouldn't reform the primary component because those nodes are expecting a third member.

Cheers

Rich

colinmollenhour commented 6 years ago

Hi Rich. The current script does the following in a simplified explanation:

You suggest "highest sequence number and lowest IP" which is what it is already doing when multiple nodes have the same highest seqno. If only one node has the highest seqno then that one will win regardless of IP. IMHO the cluster should not be recovered unless all nodes agree, otherwise, one must be chosen to start a new cluster so it is just a matter of choosing the best one. Trying to recover part of the cluster seems overly complicated and I think IST will be used so the other nodes should join very quickly. I'm not seeing where you are suggesting an improvement, but if you want to submit a PR I'd be happy to evaluate it and discuss it further.