Open erikgrinaker opened 2 years ago
We should also get better visibility into whether the quota pool is delaying anything, via e.g. better metrics or logging.
The prototype of https://github.com/cockroachdb/cockroach/issues/71169 that @tbg wrote here provides a path forward: https://github.com/cockroachdb/cockroach/pull/72092. Clearly, we could add a quotapool specific metric without the linked PR, but I do think the generality of Tobias's approach is a strength!
Related to #77251.
This issue is obsolete if we disable/remove the quota pool, x-ref https://github.com/cockroachdb/cockroach/issues/106063
We often see that a "bad node" tends to affect performance throughout the cluster. Could this be caused by the quota pool, where the follower replicas on that bad node struggle to replicate log entries, thus slowing down the leaseholders on other nodes that are otherwise fine?
We should also get better visibility into whether the quota pool is delaying anything, via e.g. better metrics or logging.
Jira issue: CRDB-12901
Epic CRDB-39898