cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.2k stars 3.82k forks source link

goschedstats: reinvestigate values of runnable goroutines that indicate overload #132694

Open RaduBerinde opened 1 month ago

RaduBerinde commented 1 month ago

Back when we added this metrics as way of determining overload, in practice we would see values 10 or even more in non-overloaded clusters. We only saw degradation once this metric exceeded 30-50 or so.

Recently there was an observation that a customer was looking at this metric and it was seeing values of only 2 on overloaded clusters. I checked a running DRT cluster that was running TPCC at 100% CPU usage (with fairly high query latencies), and this value was between 2 and 6 (which I found very surprising).

It's possible there was some change inside the Go scheduler that changes when goroutines become runnable. We should investigate if there is a difference here between recent releases. If we find a difference, we may need to update the values we use for admission control.

CC @sumeerbhola @aadityasondhi

Jira issue: CRDB-43227

aadityasondhi commented 1 month ago

We could run kv95 single-node and observe runnable counts. Compare it with older builds (close to the go version bump boundaries). Additionally, there might have been changes in the go runtime.