cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.89k stars 3.77k forks source link

tenantrate: use measured on-cpu time for rate limiting #77041

Open irfansharif opened 2 years ago

irfansharif commented 2 years ago

Is your feature request related to a problem? Please describe.

We use a rate limiter per-node to control tenant CPU activity for tenant-observed perf predictability. It tries to control a tenant's CPU usage to around 20% of a KV node. To do so, it uses a model of CPU activity by looking at a few variables: # of operations (read or write) and size of each operation. The linear model's constants were derived from experimental data (internal link) that aimed to predict # of cores needed for stock workloads by recording aforementioned variables for each run at steady state.

Some of this approach was predicated on the lack of measurable on-CPU time from the Go scheduler (see: https://github.com/golang/go/issues/41554). We opted then to fallback on a coarse model of the CPU. We've found however that this model can have a high error margin. In a recent escalation (internal link; discussion here) we found that it was possible for a single tenant to consume more than its fair share of KV CPU. Reproducing it with a secondary tenant with tenant-side cost controls disabled + running large table scans, we were able to get the single tenant to sustain a high CPU load in KV (burstiness was an artifact of manually running the table scans).

image

Describe the solution you'd like

When admitting a request we probably do want some model of predicted usage. After having processed it however, if we were able to measure precise CPU time, we should be able to send our tenant-scoped rate limiter in debt for future requests. We already have control flows for this when recording post-hoc the total bytes read. There are various ideas for precise measurement of on-CPU time:

I think we should move towards precisely attributed on-cpu time and fleshing out libraries to make this pattern easier. https://github.com/cockroachdb/cockroach/issues/58164 is relevant. After looking at the runtime changes needed (<30 lines), I think the upside of doing it far outweighs giving up on precise measurements. We'll still try to upstream the change, but even if it doesn't land/takes long, give we're using Bazel to build CRDB, it's trivial to point to a mirrored runtime with our changes. In fact, we already do. This would mean that engineers wouldn't have to maintain their own go distribution manually on their machines; with the right set of build tags we can make this "just work".

Describe alternatives you've considered

See above.

Other benefits of measured on-CPU time

I want this issue to focus on just using measured CPU time for rate limiting. It has implications for KV stability and tail performance in a multi-tenant system. That said, there are other benefits to moving towards measured on-cpu time instead of modelling or using proxies:

Jira issue: CRDB-13382

joshimhoff commented 2 years ago

It's pretty cool how many different benefits there are to precise measurement of on-CPU time.

irfansharif commented 2 years ago

The proposal to start using measured CPU time was accepted: https://github.com/cockroachdb/cockroach/pull/82356