cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.1k stars 3.81k forks source link

sql: remove legacy column family behavior for system tables #17901

Open petermattis opened 7 years ago

petermattis commented 7 years ago

Description

For very old legacy reasons, all columns in a system table are assigned independent column families. Tables in cockroach long before even the decision to adopt SQL used this column family allocation model. The decision to change it goes back to 2016 (https://github.com/cockroachdb/cockroach/pull/7623)! To this day, all system tables stealthily construct these extra column families despite their SQL serialization not showing that.

The code to do that is https://github.com/cockroachdb/cockroach/blob/43baeedebf15154f041f159c16f7a02e972e218e/pkg/sql/catalog/tabledesc/structured.go#L1336-L1340

This is made much worse by the fact that some of these system tables do not get written to using SQL. Instead they are written to using the kv interface and only a single column family is actually populated. Specifically this is at least system.descriptors, system.namespace (both versions). Maddeningly, the other place where we avoid using SQL to write to these tables is when bootstapping zone configs. Subsequent writes do use SQL. There's a chance that legacy code did not use SQL.

The fact that we have this column family encoding makes it hard to use system tables with rangefeeds as all of them utilize column families under the hood (https://github.com/cockroachdb/cockroach/issues/56160).

This has been the source of pain (https://github.com/cockroachdb/cockroach/issues/50948) and bugs (https://github.com/cockroachdb/cockroach/issues/49846, https://github.com/cockroachdb/cockroach/issues/58614).

Proposed solution

Part 1 - Stop lying in system.namespace and system.descriptor

We can just straight-up change the table descriptor for system.namespace and system.descriptor to not include the bogus, unused column families. There will be some work to then make the display string for these tables correct.

Part 2 - Make everything normal in system.zones

A touch write using SQL for every entry of system.zones as a long-running migration would make it true to its descriptor.

Part 3 - Fix the display formatting for system tables to show their true layout

We currently hide the fact that under the hood these tables have column families. We should stop doing that so we stop forgetting. After we fix this and the above things, then system tables won't be special, they'll just have stupid layouts.

Part 4 (optional) - Schema change away some of these silly column families

If these useless column families are creating sadness, we could get rid of them. My guess is that we never will.

Original issue from 2017

Noticed via drive-by. Whenever we perform a split we write 6 keys to system.rangelog. For example:

[/Table/13/1/2017-08-24T18:30:42.937472Z/273707967931285505/0)
[/Table/13/1/2017-08-24T18:30:42.937472Z/273707967931285505/2/1)
[/Table/13/1/2017-08-24T18:30:42.937472Z/273707967931285505/3/1)
[/Table/13/1/2017-08-24T18:30:42.937472Z/273707967931285505/4/1)
[/Table/13/1/2017-08-24T18:30:42.937472Z/273707967931285505/5/1)
[/Table/13/1/2017-08-24T18:30:42.937472Z/273707967931285505/6/1)

The schema for system.rangelog is:

CREATE TABLE system.rangelog (
  timestamp      TIMESTAMP  NOT NULL,
  "rangeID"      INT        NOT NULL,
  "storeID"      INT        NOT NULL,
  "eventType"    STRING     NOT NULL,
  "otherRangeID" INT,
  info           STRING,
  "uniqueID"     INT        DEFAULT unique_rowid(),
  PRIMARY KEY (timestamp, "uniqueID")
)

There should be a single column family, but for some reason, perhaps due to this being a system table, there isn't. Something similar is going on with system.eventlog. Not sure how easy this is to fix, but perhaps we can create the table descriptors correctly for clusters going forward and have existing clusters use the old descriptors.

This is low priority.

Jira issue: CRDB-6023

ajwerner commented 3 years ago

I reworked this recently in the context of #58614. It's some real tech debt that needed to be documented somewhere.