When adding UNIQUE WITHOUT INDEX constraints for the non-shard columns of a hash-sharded index in #74235, we noticed two limitations of functional dependencies that required a hacky work-around to prevent the optimizer from created unnecessary uniqueness checks. The following improvements to FDs should allow us to remove the hack, and could lead to better query plans in other cases unrelated to hash-sharded indexes.
1. Keep track of the "laxness" of each column of a key.
If 13 is nullable, then 12 is not a lax key. For example, the FD allows for (c12, c13, c11) to have values (1, null, 1) and (1, null, 2).
But if 13 is NOT NULL, then 12 is a lax key. Two rows with the same non-null value for 12 means that they have the same non-null value for 13, which means that that they have the same value for all columns.
However, our FD implementation cannot determine this because "laxness" is a property of a determinant -> dependency relationship. There is no way to fully represent the (12,13)~~>(11) dependency where 13 is NOT NULL, so information is lost when encoding these relationships as an FD. If "laxness" was a property of the columns within a determinant, then we could determine that sets of columns are lax keys in more cases. For example, from (12,~13)-->(11), (12)-->(13) we could determine that 12 is a lax key.
When adding
UNIQUE WITHOUT INDEX
constraints for the non-shard columns of a hash-sharded index in #74235, we noticed two limitations of functional dependencies that required a hacky work-around to prevent the optimizer from created unnecessary uniqueness checks. The following improvements to FDs should allow us to remove the hack, and could lead to better query plans in other cases unrelated to hash-sharded indexes.1. Keep track of the "laxness" of each column of a key.
Consider the FD:
If
13
is nullable, then12
is not a lax key. For example, the FD allows for(c12, c13, c11)
to have values(1, null, 1)
and(1, null, 2)
.But if
13
isNOT NULL
, then12
is a lax key. Two rows with the same non-null value for12
means that they have the same non-null value for13
, which means that that they have the same value for all columns.However, our FD implementation cannot determine this because "laxness" is a property of a determinant -> dependency relationship. There is no way to fully represent the
(12,13)~~>(11)
dependency where13
isNOT NULL
, so information is lost when encoding these relationships as an FD. If "laxness" was a property of the columns within a determinant, then we could determine that sets of columns are lax keys in more cases. For example, from(12,~13)-->(11), (12)-->(13)
we could determine that12
is a lax key.2. Further simplify FDs.
Consider the FD:
By transitivity,
(12,13)-->(11,14,15)
and(12)-->(13)
imply(12)-->(11,13-15)
, so the FD should be simplified to:See https://github.com/cockroachdb/cockroach/pull/74235#issuecomment-1013238525 for more details.
Jira issue: CRDB-12452