Open knz opened 3 years ago
@mwang1026 @awoods187 you'll want to follow up on this in the GDPR roadmap.
I think there are two ways we can achieve this:
This has little overlap with #66348, which is more about improving our existing infrastructure for zone configs (from how they're stored, disseminated, and applied) to be compatible with having secondary tenants. Certainly we'll want to think about how/where we store domiciled keys (using order-preserving hashes for meta2 might be another option).
I see we've filed issues for a few places where we're storing domicile-able keys (https://github.com/cockroachdb/cockroach/labels/A-gdpr-compliance). Absent an accompanying RFC (and/or a thorough audit), it might make more sense to aggregate fold everything into a single issue instead. Likely whatever we do for one (say, system.jobs
) would apply to everything else (system.zones
); the disparate issues are less easy to read or contextualize.
cc @cockroachdb/cdc
Describe the problem
When using zone configs to home region-sensitive data to their particular regions, the meta ranges do not obey the zone configs and any region-sensitive data in table keys "escape" their region.
This makes it impossible to do strict data sovereignty partitioning using multi-region CockroachDB when domicilied data is indexed. (The issue does not exist when domicilied data is not indexed.)
Note: we already document this limitation in https://www.cockroachlabs.com/docs/stable/data-domiciling.html#limitations
Epic: CRDB-10287
To Reproduce
cockroach debug keys
on all nodes(A simpler version of steps 1-2 is to create a non-partitioned table and introduce split point manually, and simply "imagine" that we have applied separate zone config to each table range. The point below remains the same.)
At step 3, we can see that the indexed values from the table show up in Meta2 keys in nodes that are unrelated to the region specified by the zone config.
Expected behavior
The meta ranges that include data from zoned tables (in the range key boundaries) should not be stored outside of the zone-specified regions.
Today, this is impossible because we do not split the meta ranges at the same boundaries as the tables.
Environment:
crdb v21.2
Jira issue: CRDB-10283