cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.92k stars 3.78k forks source link

Observability into issues with range upreplication due to zone constrains #129689

Open inata4 opened 3 weeks ago

inata4 commented 3 weeks ago

Is your feature request related to a problem? Please describe. When a range can not upreplicate CRDB doesn't present any reason why. Looking for better observability and maybe guardrails in muti-region scenarios where customer has chosen specific constraints (e.g pinning voters to one region) which narrow down the options for upreplicating. For example a node is lost and number of ranges can't upreplicate because they are constrained, DB console should show a pop up message and an option to reset zone configuration to quickly allow the customer to fix this.

Describe the solution you'd like DB console should show a pop up message and an option to reset zone configuration to quickly allow the customer to fix this.

Describe alternatives you've considered Considered possibly adding a message to warn/alert on bad zone configuration, but this wouldn't be as efficient and clusters can be resized later to make them fall into a bad zone config at a later point. Additionally without this customer suspect there are software defects rather than configuration issues.

Additional context

Jira issue: CRDB-41680

inata4 commented 3 weeks ago

Additionally can we give customers more messages and alerting on why ranges have been lost exactly and options to self-resolve (e.g. reset zone config or even something like LoQ reset)?