cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.95k stars 3.79k forks source link

kvserver: improve quota pool metrics #75978

Open erikgrinaker opened 2 years ago

erikgrinaker commented 2 years ago

We often see that a "bad node" tends to affect performance throughout the cluster. Could this be caused by the quota pool, where the follower replicas on that bad node struggle to replicate log entries, thus slowing down the leaseholders on other nodes that are otherwise fine?

We should also get better visibility into whether the quota pool is delaying anything, via e.g. better metrics or logging.

Jira issue: CRDB-12901

Epic CRDB-39898

joshimhoff commented 2 years ago

We should also get better visibility into whether the quota pool is delaying anything, via e.g. better metrics or logging.

The prototype of https://github.com/cockroachdb/cockroach/issues/71169 that @tbg wrote here provides a path forward: https://github.com/cockroachdb/cockroach/pull/72092. Clearly, we could add a quotapool specific metric without the linked PR, but I do think the generality of Tobias's approach is a strength!

erikgrinaker commented 2 years ago

Related to #77251.

tbg commented 1 year ago

This issue is obsolete if we disable/remove the quota pool, x-ref https://github.com/cockroachdb/cockroach/issues/106063