cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.2k stars 3.82k forks source link

spanconfigccl: full translation is O(Databases * Descriptors) #90655

Open ajwerner opened 2 years ago

ajwerner commented 2 years ago

Describe the problem

When we go to translate a database, we fetch all the descriptors. We have to do this because we have no more efficient way to find the descriptors in the database. Fundamentally, we need to discover dropped descriptors. Dropped descriptors do not have namespace entries.

Additional context Relates to https://github.com/cockroachdb/cockroach/issues/26476 and https://github.com/cockroachdb/cockroach/issues/73277

Jira issue: CRDB-20872

Epic CRDB-24134

ajwerner commented 2 years ago

I think the first thing I'd do to make this cheaper is to have a way to find all of the descriptors for a database (including dropped ones) in catkv by peeking into the descriptor proto and skipping it if if it is not part of the database we're interested in. This is just going to skip some expensive unmarshaling and validation.

ajwerner commented 2 years ago

Another approach is to cache the ID->parent ID mapping so that we can skip decoding the bytes altogether.

ajwerner commented 2 years ago

Another important note is that we wouldn't do this full translation very often if we consulted a checkpoint on resume. Right now any time the job restarts, we do a full translate. https://github.com/cockroachdb/cockroach/issues/73694

knz commented 1 year ago

how do you feel about a special/hidden "trash" database?

ajwerner commented 1 year ago

Conceptually I'm not opposed. We'd need a new naming scheme given the space for name collisions. There are details to sort out also regarding how it pertains to schemas. Fundamentally such an approach is fine, it's just a non-trivial project.

ajwerner commented 1 year ago

Also, for better or for worse (probably somewhat for better?) the zone configs of a dropped table continue to mirror that of the parent database when dropping just a table or index or what not. We may not want to break that.