cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.76k stars 3.76k forks source link

tabledesc: fix the `referencing_descriptor_ids` field in type descriptors #63161

Open postamar opened 3 years ago

postamar commented 3 years ago

There are a few issues related to ReferencingDescriptorIDs in type descriptors, all related to how these descriptor references behave differently than others, like foreign key references, etc.

  1. Although the name would suggest otherwise, ReferencingDescriptorIDs only contains table descriptor IDs. Either rename the field to reflect this, or also include database or type descriptor IDs as well.

  2. Unlike all other descriptor references, ReferencingDescriptorIDs contains not just direct references but also transitive references. Indeed, when a table descriptor goes through the schema changer machinery, we add its ID to all type descriptors whose IDs are in the return value of GetAllReferencedTypeIDs(), which includes the type descriptors' transitive closure. This is useful to quickly check whether a type can be dropped safely or not, but it is unlikely to scale well and has all kinds of ugly implications (what happens when we have record types as well as enum types? etc.). I strongly feel we should stick to direct references, along with providing some useful machinery in the descriptor collection or wherever in catalog to walk the relational graph to collect all transitive dependencies.

  3. Unlike all other descriptor references, there are no back-reference checks done on ReferencingDescriptorIDs when validating a table descriptor, for instance. Adding these is not so straightforward, since in some cases these back-references are not added in the same transaction but asynchronously in a schema changer job. Specifically, I tried adding back-reference checks and the CCL logic test suite alter_table_locality failed with the multiregion-9node-3region-3azs config due to a validation failure following a ROLLBACK.

To address these points I suggest we apply a similar treatment to this field as we did to the old foreign key representation:

Jira issue: CRDB-6452

Epic CRDB-31469

arulajmani commented 2 years ago

Indeed, when a table descriptor goes through the schema changer machinery, we add its ID to all type descriptors whose IDs are in the return value of GetAllReferencedTypeIDs(), which includes the type descriptors' transitive closure.

I was in the backup/restore area debugging something entirely different when I chanced upon an explanation for why this may have been designed the way it was. Consider the case when we're backing up a table that contains a user defined type. If this table is restored into a database which doesn't contain this type we also restore the type descriptor. Note that a type descriptor can't exist without a corresponding array type alias. As such, it looks like we elected to capture the array type descriptor as a "dependency". This if block https://github.com/cockroachdb/cockroach/blob/1684499d8979caea5aeac537f9c8da6d5f225307/pkg/ccl/backupccl/restore_planning.go#L732. is where all this happens.

FWIW, I don't think we necessarily need to rely on backing up the array type descriptor for this though, considering that the array type descriptor can be entirely derived from the actual enum type descriptor.

postamar commented 2 years ago

Consider the case when we're backing up a table that contains a user defined type. If this table is restored into a database which doesn't contain this type we also restore the type descriptor. Note that a type descriptor can't exist without a corresponding array type alias. As such, it looks like we elected to capture the array type descriptor as a "dependency".

Thanks for the explanation! This design choice makes a lot more sense to me now.

postamar commented 2 years ago

This just got pushed onto the critical path, if we're to implement DROP TYPE ... CASCADE correctly. Also bad things like https://github.com/cockroachdb/cockroach/pull/84305 keep happening because of this lack of validation, so we might as well do it sooner rather than later.

postamar commented 1 year ago

I've unassigned myself considering that I'm very unlikely to close this before going on leave.