cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.91k stars 3.78k forks source link

sql: computed columns are not validated on existing data #81698

Open ajwerner opened 2 years ago

ajwerner commented 2 years ago

Describe the problem

We don't currently evaluate computed expressions for new virtual computed columns to ensure that they are indeed valid for all of the data in the table. This is a major oversight.

To Reproduce

CREATE TABLE t (i INT PRIMARY KEY);
INSERT INTO t VALUES (1);
ALTER TABLE t ADD COLUMN s VARCHAR(2) AS ('stored') STORED;
ALTER TABLE t ADD COLUMN v VARCHAR(2) AS ('virtual') STORED;
SELECT * FROM t;
  i |   s    |    v
----+--------+----------
  1 | stored | virtual
INSERT INTO t VALUES (2);
ERROR: value too long for type VARCHAR(2)
SQLSTATE: 22001

Expected behavior

We should fail to add these columns.

Jira issue: CRDB-16484

Epic CRDB-31282

ajwerner commented 2 years ago

Same problem for decimals and other things of that ilk. This goes way way back. I think the same problem applies for default values.

mgartner commented 1 year ago

Postgres (on 14.4, at least) has a weird inconsistency: a table can be created with a column like v VARCHAR(2) GENERATED ALWAYS AS ('foo') STORED but will error when adding the same column to an existing table:

marcus=# SELECT version();
                                                      version
--------------------------------------------------------------------------------------------------------------------
 PostgreSQL 14.4 on x86_64-apple-darwin21.5.0, compiled by Apple clang version 13.1.6 (clang-1316.0.21.2.5), 64-bit
(1 row)

marcus=# CREATE TABLE t1 (i INT, v VARCHAR(2) GENERATED ALWAYS AS ('foo') STORED);
CREATE TABLE

marcus=# INSERT INTO t1(i) VALUES (1);
ERROR:  22001: value too long for type character varying(2)

marcus=# CREATE TABLE t2 (i INT);
CREATE TABLE

marcus=# ALTER TABLE t2 ADD COLUMN v VARCHAR(2) GENERATED ALWAYS AS ('foo') STORED;
ERROR:  22001: value too long for type character varying(2)

So it looks like we handle the CREATE TABLE case correctly, but need special logic for the ALTER TABLE case. I originally thought the error was the result of an assignment cast performed while backfilling the column, but the error occurs for an empty table when nothing is backfilled... unless they always evaluate the expression even if the table has no rows. Maybe we should solve this similarly?