cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.97k stars 3.79k forks source link

admission: subject internal traffic to admission control #79212

Open erikgrinaker opened 2 years ago

erikgrinaker commented 2 years ago

Admission control is currently opt-in, in the sense that it's controlled by BatchRequest.AdmissionHeader and the zero value for Source (OTHER) is currently excluded from admission control. I believe all SQL traffic sets this header, but lots of internal traffic does not, which can lead to stores getting overloaded (in particular with bulk operations).

For example, the widely used DB.Txn() and kv.NewTxn() functions do not set admission control headers (nor do any of the other DB methods). In 1d4a1a3563d148e0008fb76ba6047210c573172e we added WithAdmissionControl() variants of these that allow the caller to specify the admission control source and priority, so one option here is to simply migrate all callers. However, there are hundreds of call sites so this will be spotty and annoying.

It would probably be better if we could set the source and priority for an entire component, such that all derived requests get a proper admission control header. One way to do this would be to set it in the context. We should likely also subject all requests to admission control by default, and instead allow callers to explicitly opt out, e.g. by marking themselves as system traffic.

Test tooling to assert correct admission control headers would also be useful.

Jira issue: CRDB-14639

sumeerbhola commented 2 years ago

It would probably be better if we could set the source and priority for an entire component, such that all derived requests get a proper admission control header. One way to do this would be to set it in the context. We should likely also subject all requests to admission control by default, and instead allow callers to explicitly opt out, e.g. by marking themselves as system traffic.

+1 to this. We do need to be very careful about distributed deadlock, as described in https://github.com/cockroachdb/cockroach/blob/588c03d2f1f5c8763762a5ebbe86c853536718e1/pkg/roachpb/api.proto#L2494-L2500