cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.9k stars 3.78k forks source link

kv: improve observability into under-replicated ranges #129102

Open nicktrav opened 1 month ago

nicktrav commented 1 month ago

Adapted from CRDB-40230.

Currently, our logs will only indicate there are under-replicated ranges, but they don't say why these ranges are under-replicated.

The why requires knowledge and analysis of log files (or files in a debug.zip). These are often difficult to parse.

Improve our logging and how we display and report on under-replicated ranges in the DB console.

Jira issue: CRDB-41374

tbg commented 1 week ago

In a similar vein, I had looked at a few issues where ranges were a little more than under-replicated: they were unavailable because no quorum could be formed for appending log entries, i.e. only a minority was up to date on the log. Often this involved a few down nodes, and one of the replicas in the remaining required quorum then either not receiving a snapshot that it needed or just behind behind on the log for opaque reasons.