Closed irfansharif closed 2 years ago
This is something we'll try tackling (or at least prototyping) in November. A brief plug for why it's important:
After #67679, enabling the span configs infrastructure by default (something we're also looking to do over November) would mean that for every single schema object in a multi-tenant cluster, we would be inducing a range split. This is exactly what happens with the host tenant today where every table/index/partition gets its own range. Before #67679, for each tenant we started off with just a mega-range containing the entire tenant ID prefix.
--- Seeing range boundaries on the host tenant, with a single secondary tenant pre-67679. The last one is of interest.
> SELECT range_id, start_pretty, end_pretty from crdb_internal.ranges;
range_id | start_pretty | end_pretty
-----------+-------------------------------+--------------------------------
1 | /Min | /System/NodeLiveness
2 | /System/NodeLiveness | /System/NodeLivenessMax
3 | /System/NodeLivenessMax | /System/tsd
4 | /System/tsd | /System/"tse"
5 | /System/"tse" | /Table/SystemConfigSpan/Start
6 | /Table/SystemConfigSpan/Start | /Table/11
7 | /Table/11 | /Table/12
8 | /Table/12 | /Table/13
...
42 | /Table/46 | /Table/47
43 | /Table/47 | /Tenant/123
44 | /Tenant/123 | /Max
After #67679 + enabling the infra.
> SELECT range_id, start_pretty, end_pretty from crdb_internal.ranges;
range_id | start_pretty | end_pretty
-----------+--------------------------------+---------------------------------
...
42 | /Table/46 | /Table/47
43 | /Table/47 | /Tenant/123
44 | /Tenant/123 | /Tenant/123/Table/3
45 | /Tenant/123/Table/3 | /Tenant/123/Table/4
46 | /Tenant/123/Table/4 | /Tenant/123/Table/5
47 | /Tenant/123/Table/5 | /Tenant/123/Table/6
...
83 | /Tenant/123/Table/45 | /Tenant/123/Table/46
84 | /Tenant/123/Table/46 | /Tenant/123/Table/47
85 | /Tenant/123/Table/47 | /Max
Having O(descriptors across all tenants)
ranges in the system is going to be difficult for us to work with, and can readily be used as an attack vector. We'd either need to impose a hard limit on the number of descriptor objects a tenant is allowed to have, or do what this issue suggests -- introducing a limit based on the number of individually configurable spans. This is still a pretty rudimentary safeguard, but fear not, we can do much better on the KV side to move away from always splitting on descriptor boundaries: #72389.
(+cc @shralex, @nvanbenschoten, @ajwerner, @arulajmani)
In the short-term (22.1) all we'll do is maintain a counter for secondary tenants and for every schema change, use a translator-like component to determine how many splits are being added/removed. We don't need to look at the corresponding zone config for the descriptor as that has no bearing here. We'll inc/decrement our counter accordingly, comparing it against a per-tenant limit exposed by the host -- perhaps using a "tenant read-only cluster setting", using the verbiage from multi-tenant cluster settings rfc. If we're over the limit, we'll abort the schema change txn. KV of course would still maintain its internal counter for the number of splits/etc per-tenant, and use that to reject span config updates if they go overboard. For unmodified tenants, we don't expect to run into these errors.
This scheme, in contrast to the one described above, has the downside that it would serialize all schema changes. That feels reasonable for the short term, especially considering it only applies to secondary tenants. Incorporating leasing in any form would purely be a performance optimization -- to have real schema change concurrency.
How would this work for internal APIs like AdminSplit
? Would they get caught by this mechanism and tracked accordingly? Or would they enter KV below the "schema change"-level tracking? I'm asking because we may be able to enable AdminSplit
directly if it gets tracked by this scheme.
we may be able to enable AdminSplit directly if it gets tracked by this scheme.
I don't think AdminSplit
can directly leverage the mechanisms described above, at least not in this current form, and not without additional leg work. AdminSplits are issued directly to KV as a non-transactional requests (follow code here). The SQL side counter we're maintaining is done so transactionally, in the tenant's own keyspace. Crucially, it's maintained without ever having to ask KV about its own trackers. To track an AdminSplit request (these typically have expiration timestamps) we'd have to maintain this SQL side counter in tandem with issuing the request to KV, where it may or may not be rejected (some form of 2PC maybe?). In addition, once the AdminSplit request has expired, the SQL side counter would probably want to reflect the change -- it would perhaps need to reach out to KV to reconcile what the current state of those requests are. This is all doable with some effort, and there might be simpler approaches, but I'm not sure if it's quite as "drop-in" with everything else this issue is focused on.
I know we recently filed #74389, should we brainstorm there? It's not yet clear to me yet that even want to carry forward AdminSplit/Scatter as an API, let alone extend it to secondary tenants. There was some KV internal discussion here on the subject.
Hi @irfansharif, please add branch-* labels to identify which branch(es) this release-blocker affects.
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.
manually reviewed and brought up to date
66348 outlined a scheme to support zone configs for secondary tenants. Since the unit of what we can configure in KV is a Range, tenants now being able to set zone configs grants them the ability to induce an arbitrary number of splits in KV -- something we want to put guardrails against (see RFC for more details).
There's some discussion elsewhere for how we could achieve this, copying over one in particular:
The RFC also captures a possible schema we could use on the host tenant to store these limits:
The actual RPCs/interfaces are TBD. Probably we also want the ability to configure these limits on a per-tenant basis.
Epic CRDB-10563
Jira issue: CRDB-10116