Is data center awareness in primary election important?

misterbisson commented 8 years ago

I've been been working on how to operate Autopilot Pattern apps across multiple data centers (geographically distinct data centers connected over a WAN). In Consul, that led to a data center naming question https://github.com/autopilotpattern/consul/issues/23, and others.

As I explore how to do this in MySQL (using https://github.com/autopilotpattern/wordpress/issues/27 as the scenario), I'm trying to determine the importance of data center awareness. On the one hand, it's important to have a solid strategy for recovering from complete data center failures. On the other, the risk of split brain scenarios grows dramatically over a WAN.

For the purpose of this question and the scenario in https://github.com/autopilotpattern/wordpress/issues/27, let's assume a standard master-replica replication topology (not multi-master, not sharded).

From a data center that's remote from the primary, how can we determine the difference between a failure of the primary, the failure of the entire data center the primary is in, or a network partition of the two data centers?

tgross commented 8 years ago

Before we can even start to answer that question we need to know what kind of replication we're talking about here. Are we talking master-master (which is super dodgy with MySQL), sharded-master, or master-replica over the WAN?

misterbisson commented 8 years ago

need to know what kind of replication we're talking about

Fair point. I'm assuming master-replica only for now. We're not really making much effort to support other topologies in this implementation, but we should at least ask the question.

The scenario I'm using to explore this is further addressed in https://github.com/autopilotpattern/wordpress/issues/27. I'll update the story at the top to clarify the intended topology.

autopilotpattern / mysql

Is data center awareness in primary election important? #53