cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.04k stars 3.8k forks source link

kvserver: exclude rejoined nodes as lease transfer targets for a longer suspect duration than failed heartbeats #132796

Open kvoli opened 5 days ago

kvoli commented 5 days ago

We have previously added protection to reduce QoS impact upon a node rejoining (#96521), after being down for a short period of time and requiring to be caught up on any missed messages.

This protection was known to be insufficient to prevent all QoS impact but was sufficient to reduce the impact.

image

RACv2 operating in apply_to_all provides the prevention, but isn't planned to be enabled by default until v25.1 (currently in process of cutting 24.3).

This issue is to enhance the existing protection to provide further reduction in QoS impact.

See investigation in #132615.

Jira issue: CRDB-43281

kvoli commented 4 days ago

See https://github.com/cockroachdb/cockroach/issues/132879, this may not be necessary nor a good idea without addressing divergences in status used in the allocator.