cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.88k stars 3.77k forks source link

move replicas from decommissioning nodes even if nodes are in unavailable state #99064

Open aliher1911 opened 1 year ago

aliher1911 commented 1 year ago

Dead and live nodes should be treated equally by allocator when they are marked as decommissioning. Currently allocator would only start moving voters from decommissioning node if it is live and ignore it if it is livenesspb.NodeLivenessStatus_UNAVAILABLE.

This behaviour will cause drain to wait for server.time_until_store_dead (5 minutes) till node is declared dead and that would trigger dead node rule to move voters away from it.

Ideally allocator should drain everything regardless of liveness, but internally both liveness and decommission states are represented by a single enum and liveness needs to take precedence so it masks decommissioning state. Handling states separately would allow allocator behaviour to be more consistent.

Jira issue: CRDB-25686

aliher1911 commented 1 year ago

Test that exposed the issue: https://github.com/cockroachdb/cockroach/pull/99020

blathers-crl[bot] commented 2 weeks ago

Hi @kvoli, please add branch-* labels to identify which branch(es) this C-bug affects.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.