cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.04k stars 3.79k forks source link

kv: change meta ranges to honor fine grained data domiciliation zone configs over indexed values #70912

Open knz opened 3 years ago

knz commented 3 years ago

Describe the problem

When using zone configs to home region-sensitive data to their particular regions, the meta ranges do not obey the zone configs and any region-sensitive data in table keys "escape" their region.

This makes it impossible to do strict data sovereignty partitioning using multi-region CockroachDB when domicilied data is indexed. (The issue does not exist when domicilied data is not indexed.)

Note: we already document this limitation in https://www.cockroachlabs.com/docs/stable/data-domiciling.html#limitations

Epic: CRDB-10287

To Reproduce

  1. create a geo-partitioned table with sensitive data in some indexed columns
  2. use a zone config to map the region-specific data to separate regions
  3. run cockroach debug keys on all nodes

(A simpler version of steps 1-2 is to create a non-partitioned table and introduce split point manually, and simply "imagine" that we have applied separate zone config to each table range. The point below remains the same.)

At step 3, we can see that the indexed values from the table show up in Meta2 keys in nodes that are unrelated to the region specified by the zone config.

Expected behavior

The meta ranges that include data from zoned tables (in the range key boundaries) should not be stored outside of the zone-specified regions.

Today, this is impossible because we do not split the meta ranges at the same boundaries as the tables.

Environment:

crdb v21.2

Jira issue: CRDB-10283

knz commented 3 years ago

@mwang1026 @awoods187 you'll want to follow up on this in the GDPR roadmap.

knz commented 3 years ago

I think there are two ways we can achieve this:

irfansharif commented 3 years ago

This has little overlap with #66348, which is more about improving our existing infrastructure for zone configs (from how they're stored, disseminated, and applied) to be compatible with having secondary tenants. Certainly we'll want to think about how/where we store domiciled keys (using order-preserving hashes for meta2 might be another option).

I see we've filed issues for a few places where we're storing domicile-able keys (https://github.com/cockroachdb/cockroach/labels/A-gdpr-compliance). Absent an accompanying RFC (and/or a thorough audit), it might make more sense to aggregate fold everything into a single issue instead. Likely whatever we do for one (say, system.jobs) would apply to everything else (system.zones); the disparate issues are less easy to read or contextualize.

blathers-crl[bot] commented 1 year ago

cc @cockroachdb/cdc