Open ajwerner opened 2 years ago
I think the first thing I'd do to make this cheaper is to have a way to find all of the descriptors for a database (including dropped ones) in catkv by peeking into the descriptor proto and skipping it if if it is not part of the database we're interested in. This is just going to skip some expensive unmarshaling and validation.
Another approach is to cache the ID->parent ID mapping so that we can skip decoding the bytes altogether.
Another important note is that we wouldn't do this full translation very often if we consulted a checkpoint on resume. Right now any time the job restarts, we do a full translate. https://github.com/cockroachdb/cockroach/issues/73694
how do you feel about a special/hidden "trash" database?
Conceptually I'm not opposed. We'd need a new naming scheme given the space for name collisions. There are details to sort out also regarding how it pertains to schemas. Fundamentally such an approach is fine, it's just a non-trivial project.
Also, for better or for worse (probably somewhat for better?) the zone configs of a dropped table continue to mirror that of the parent database when dropping just a table or index or what not. We may not want to break that.
Describe the problem
When we go to translate a database, we fetch all the descriptors. We have to do this because we have no more efficient way to find the descriptors in the database. Fundamentally, we need to discover dropped descriptors. Dropped descriptors do not have namespace entries.
Additional context Relates to https://github.com/cockroachdb/cockroach/issues/26476 and https://github.com/cockroachdb/cockroach/issues/73277
Jira issue: CRDB-20872
Epic CRDB-24134