cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.11k stars 3.81k forks source link

allocator: Enhance allocator to evenly distribute ranges for a given tenant #77869

Open andy-kimball opened 2 years ago

andy-kimball commented 2 years ago

Today, the allocator is not aware of multi-tenancy when distributing ranges across KV nodes. This means that ranges for a given tenant can "bunch up" on a single node, or a small number of nodes. That, in turn, can lead to performance bottlenecks, since we try to limit the maximum utilization of a single tenant to 20% of a KV node. An individual tenant may have hit the 20% limit, but the allocator takes no action, because the node is under-utilized from a macro point of view.

Ideally, the allocator would try to distribute each tenant's ranges evenly across KV nodes, just as it tries to evenly distribute ranges across available zones and regions.

Jira issue: CRDB-13805

andy-kimball commented 2 years ago

CC @lunevalex

andy-kimball commented 2 years ago

There may be other solutions we should consider to this problem. This issue is intended to be a place where we can discuss further.

sumeerbhola commented 1 year ago

Ideally, the allocator would try to distribute each tenant's ranges evenly across KV nodes, just as it tries to evenly distribute ranges across available zones and regions.

I think the above is insufficient. If we evenly spread the ranges of a tenant across the nodes we can still have the ranges that see a load spike be concentrated on a few nodes. If the tenant_rate_limiter starts throttling on those nodes and the allocator does nothing, that is a problem.

I suppose the this will usually work out since the allocator tries to achieve even cpu usage (though there are other resources like store write bandwidth which are not considered). But we could get unlucky in that this node may have recently been commissioned, so was at 10%, and now the surging tenant has increased that to 30%, and the tenant is being throttled, but since the mean across the nodes is 50% the allocator will not shed load from this node. I think we have 2 options:

IMO, the second option is preferable.