Closed jasonmp85 closed 9 years ago
Expected behavior: the output DDL should look identical to the input DDL, save for the shard identifier.
Hey @jasonmp85,
While I was working on this issue, I realized that PostgreSQL allows different syntax for CONSTRAINT
.
All of the following DDL commands works fine:
CREATE TABLE products_1 (
product_no integer,
name text,
price numeric
CONSTRAINT positive_price CHECK (price > 0)
);
CREATE TABLE products_2 (
product_no integer,
name text,
price numeric,
CONSTRAINT positive_price CHECK (price > 0)
);
CREATE TABLE products_3 (
product_no integer,
name text,
price numeric,
CONSTRAINT positive_price CHECK (price > 0),
CONSTRAINT postive_product CHECK (product_no > 0)
);
CREATE TABLE products_4 (
product_no integer,
name text,
price numeric
CONSTRAINT positive_price CHECK (price > 0),
CONSTRAINT postive_product CHECK (product_no > 0)
);
CREATE TABLE products_5 (
product_no integer,
name text,
price numeric
CONSTRAINT positive_price CHECK (price > 0)
CONSTRAINT postive_product CHECK (product_no > 0)
);
So, my point is that it is not possible (or easy) for us to get the expected behavior you suggested:
Expected behavior: the output DDL should look identical to the input DDL, save for the shard identifier.
Consider products_5
, we'll always put commas among CONSTRAINT
s. Because, we work on the parsed query and don't know whether there are commas among the constraints or not.
And, currently what we do works fine. We put commas among CONSTRAINT
s, but, we skip the the comma among the last column and first constraint. Our DLL command that is pushed down looks like products_4
.
I think we can leave this as it is. What do you think?
Do all those constraints show up in the Query
as the same kind of node? If you look at the documentation, the syntax has parts labeled column_constraint
and table_constraint
. A column constraint appears directly against a column (no comma) as is supposedly syntactically attached to that column declaration. A table constraint appears by itself in the list (set off by commas), and stands alone.
I thought the difference between them is a column constraint might just be related to the column (i.e. val > 0
for a column val
) whereas a table constraint says something about more than one column, or about the overall table (i.e. min < max
).
With this in mind, it might be clearer to write your syntax as:
CREATE TABLE products_1 (
product_no integer,
name text,
price numeric CONSTRAINT positive_price CHECK (price > 0) -- column constraint
);
CREATE TABLE products_2 (
product_no integer,
name text,
price numeric,
CONSTRAINT positive_price CHECK (price > 0) -- table constraint
);
If all constraints appear normalized within the Query
structure, and there is really no way to tell the difference between them, then I think we should just emit all constraints as table constraints. But column/table constraints appears as distinct entities in the query tree, we should emit column constraints next to their columns and table constraints as separate in the list.
To quote the documentation:
There are two ways to define constraints: table constraints and column constraints. A column constraint is defined as part of a column definition. A table constraint definition is not tied to a particular column, and it can encompass more than one column. Every column constraint can also be written as a table constraint; a column constraint is only a notational convenience for use when the constraint only affects one column.
Based on this, it sounds like we should find all constraints, and emit them all as table constraints. But the bug I found transforms a table constraint into a column constraint, which is potentially problematic, especially considering that the documentation shows a slightly different grammar for column and table constraints in their BNFish syntax definition.
@jasonmp85 After our chat, I understand the bug. Yes, as you suggest, we already find all check constraints(column and table) in the table and convert them to table constraints. The following quote from the PostgreSQL documentation also reinforces our approach:
We say that the first two constraints are column constraints, whereas the third one is a table constraint because it is written separately from any one column definition. Column constraints can also be written as table constraints, while the reverse is not necessarily possible, since a column constraint is supposed to refer to only the column it is attached to. (PostgreSQL doesn't enforce that rule, but you should follow it if you want your table definitions to work with other database systems.)
This commit fixes DeparseCreateStmt function, in which we fail to add a comma for the first table constraint that is deparsed. Since we forget the first comma after column definitions, the first table constraint was considered as column constraint.
Basically, I tried to use the same approach in pg_shard_get_tableschemadef_string, which works as expected.
While reviewing error message style in
pg_shard
, I noticed that an error message in the DDL extension code referred to a set of constraints as a column constraint. "That's odd", I thought… "I thought those were table constraints?"The unit test for this section of code extends the following DDL command:
There are several column constraints here (specifically,
NOT NULL
ones), but the finalCONSTRAINT
is a table constraint because it is not syntactically attached to any column. But here's what the extended DDL looks like:The constraint is generated right next to the final column, when in fact there should be a comma to separate it. The unit test itself is wrong (it should have a comma).