cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.07k stars 3.8k forks source link

testing: test/document behavior when one locality loses connectivity to one locality but not the other #21232

Closed robert-s-lee closed 3 years ago

robert-s-lee commented 6 years ago

QUESTION

Consider the following 3 localities with default 3 way replica

A---C
 \ / 
  B

A and C suffer network outage, but B is still connected to A and C

A-x-C
 \ / 
  B
A-x-C
|\ /|
|/ \|
B---D
bdarnell commented 6 years ago

Ranges with their leaseholder at B are fine. Ranges with their leaseholder at A or C will remain readable (for clients who can reach them). They will remain writeable for a time, but may become read-only to avoid allowing the disconnected replica to fall too far behind.

If the liveness range has its leaseholder in A, nodes in C will be unable to update their heartbeats, so they will appear to be down and everything will move towards A and B (or vice versa if the liveness range is in C). If the liveness range has its leaseholder in B, the broken state could persist for a long time. Failures of the liveness range (and to a lesser extent the meta ranges) can cascade into larger-scale cluster problems.

When connectivity is restored, everything should come back up. We've done some testing of this (jepsen and otherwise); we've had bugs in the past where we would not come back from this kind of network failure without manual intervention.

I don't think the 4-locality version really changes anything currently - nodes with their leader in A or C will still have problems, and there is nothing in particular that will pull replicas out of those regions into a connected subset of nodes.

Currently, this is best addressed as a network routing problem. When the A-C break is detected, packets addressed from A to C should be rerouted on the A-B-C path.

petermattis commented 6 years ago

@m-schneider This issue was spawned from a private issue. Let's add this scenario to our geo-distributed testing and discover what the cluster actually does.

m-schneider commented 6 years ago

Will do!

On Thu, Jan 4, 2018 at 11:56 AM, Peter Mattis notifications@github.com wrote:

@m-schneider https://github.com/m-schneider This issue was spawned from a private issue. Let's add this scenario to our geo-distributed testing and discover what the cluster actually does.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cockroachdb/cockroach/issues/21232#issuecomment-355336734, or mute the thread https://github.com/notifications/unsubscribe-auth/Ab6W5xwM6ThRcRk2mknC2I6SkEKHMmhSks5tHQLWgaJpZM4RTL9Y .

robert-s-lee commented 6 years ago

@m-schneider Below is a draft test plan. The attached scripts are Docker demo scripts to perform these tests on a laptop.

W=West C=Central E=East

Symbol Description
*E* either the node or the database is down
--x-- bi-directional network block
-10ms- 10ms bi-directional latency between the nodes
-10ms>- left side can initiate with delay and the other side can respond, but the other side cannot initiate
-<10ms- right side can initiate with delay and the other side can respond, but the other side cannot initiate

Additional esoteric edge condition tests are possible as supported by Linux tc

typical failure scenarios


- Any one node can be down. One node network isolated but the node and process itself are running.   
Example of network to node E down is shown below.

W---x---E \ / \ x \ / C


## unusual failure scenarios

- Network link is down between two nodes.  
Example of nodes W and E link down is shown below.

W---x---E \ / \ / \ / C


- Network link only works in one direction between two nodes.  
Example of node W being able to communicate with E, but E cannot initiate communication with W

W--->---E \ / \ / \ /
C

Example of node E being able to communicate with W, but W cannot initiate communicate with E

W---<---E \ / \ / \ / C

## failure scenarios where system will not be available

- Any two nodes are down  
Example of nodes A and C down is shown below.

W-------E \ / \ / \ / C

- network partition

W---x---E \ / x x \ / C



[Archive.zip](https://github.com/cockroachdb/cockroach/files/1612877/Archive.zip)
tbg commented 6 years ago

Let's model these in the context of the network partitioning roachtests: #23141

knz commented 5 years ago

cc @tbg for re-triage - is there anything actionable here?

tbg commented 5 years ago

Yes, test these in CI and fix any unexpected gotchas.