basho / riak_core

Distributed systems infrastructure used by Riak.
Apache License 2.0
1.23k stars 392 forks source link

Extending location awareness for general join/leave support #1001

Open martinsumner opened 1 year ago

martinsumner commented 1 year ago

Background information:

When joining a node, the following algorithms are attempted:

1 - A basic attempt to satisfy wants (vnodes required by the joining node) by asking node-by-node which vnodes can be passed on without breaking target_n_val (the claim_v2 algortihm). 2 - If Step 1 is unsuccessful, then attempt to stripe the all vnodes across all nodes (the sequential_claim algorithm). 3 - If Step 2 creates tail violations (i.e. if 0 < RingSize rem NodeCount < TargetNVal), resolve through the solve_tail_violations algorithm.

When leaving a node, the following algorithms are attempted:

1 - A basic attempt to perform a simple_transfer (vnodes are passed in turn to nodes that would not break target_n_val). 2 - Use sequential_claim as in join. 3 - Use solve_tail_violations extension to sequential_claim as in join

Ideally, in both cases Step 1 should succeed - as Step 2 will inevitable lead to a full cluster reorganisation (and hence a large volume of transfers).

As part of https://github.com/basho/riak_core/pull/967 location awareness was added to the sequential_claim algorithm (Step 2).

This issue is to document an ongoing investigation to these three problems:

martinsumner commented 1 year ago

The initial condition to be tested is, will sequential_claim and solve_tail_violations consistently work if:

To explain the last condition, if there are L locations, where L > TargetNVal, then at least TargetNVal locations must have M nodes, and the remaining (L - TargetNVal) locations must have =< M nodes.

martinsumner commented 1 year ago

The hypothesis above is incorrect. Even with these pre-conditions there are still failures with sequential_claim to support target_n_val.

e.g. RS 128, or RS 256 with 10 nodes split evenly across 5 locations.