citusdata / pg_shard

ATTENTION: pg_shard is superseded by Citus, its more powerful replacement
https://github.com/citusdata/citus
GNU Lesser General Public License v3.0
1.06k stars 63 forks source link

Table constraints deparsed incorrectly #94

Closed jasonmp85 closed 9 years ago

jasonmp85 commented 9 years ago

While reviewing error message style in pg_shard, I noticed that an error message in the DDL extension code referred to a set of constraints as a column constraint. "That's odd", I thought… "I thought those were table constraints?"

The unit test for this section of code extends the following DDL command:

CREATE TABLE employees ( 
    first_name text NOT NULL, 
    last_name  text NOT NULL, 
    id         bigint NOT NULL, 
    salary     numeric DEFAULT 0.00, 
    start_date timestamp without time zone, 
    resume     text, 
    CONSTRAINT sal_check CHECK (salary >= 0.00) 
);

There are several column constraints here (specifically, NOT NULL ones), but the final CONSTRAINT is a table constraint because it is not syntactically attached to any column. But here's what the extended DDL looks like:

CREATE TABLE employees_12345 ( 
    first_name  text NOT NULL, 
    last_name   text NOT NULL, 
    id          bigint NOT NULL, 
    salary      numeric DEFAULT 0.00, 
    start_date  timestamp without time zone, 
    resume text CONSTRAINT sal_check CHECK (salary >= 0.00) 
);

The constraint is generated right next to the final column, when in fact there should be a comma to separate it. The unit test itself is wrong (it should have a comma).

jasonmp85 commented 9 years ago

Expected behavior: the output DDL should look identical to the input DDL, save for the shard identifier.

onderkalaci commented 9 years ago

Hey @jasonmp85,

While I was working on this issue, I realized that PostgreSQL allows different syntax for CONSTRAINT.

All of the following DDL commands works fine:

CREATE TABLE products_1 (
    product_no integer,
    name text,
    price numeric
    CONSTRAINT positive_price CHECK (price > 0)
);

CREATE TABLE products_2 (
    product_no integer,
    name text,
    price numeric,
    CONSTRAINT positive_price CHECK (price > 0)
);

CREATE TABLE products_3 (
    product_no integer,
    name text,
    price numeric,
    CONSTRAINT positive_price CHECK (price > 0),
    CONSTRAINT postive_product CHECK (product_no > 0)
);

CREATE TABLE products_4 (
    product_no integer,
    name text,
    price numeric
    CONSTRAINT positive_price CHECK (price > 0),
    CONSTRAINT postive_product CHECK (product_no > 0)
);

CREATE TABLE products_5 (
    product_no integer,
    name text,
    price numeric
    CONSTRAINT positive_price CHECK (price > 0)
    CONSTRAINT postive_product CHECK (product_no > 0)
);

So, my point is that it is not possible (or easy) for us to get the expected behavior you suggested: Expected behavior: the output DDL should look identical to the input DDL, save for the shard identifier.

Consider products_5, we'll always put commas among CONSTRAINTs. Because, we work on the parsed query and don't know whether there are commas among the constraints or not.

And, currently what we do works fine. We put commas among CONSTRAINTs, but, we skip the the comma among the last column and first constraint. Our DLL command that is pushed down looks like products_4.

I think we can leave this as it is. What do you think?

jasonmp85 commented 9 years ago

Do all those constraints show up in the Query as the same kind of node? If you look at the documentation, the syntax has parts labeled column_constraint and table_constraint. A column constraint appears directly against a column (no comma) as is supposedly syntactically attached to that column declaration. A table constraint appears by itself in the list (set off by commas), and stands alone.

I thought the difference between them is a column constraint might just be related to the column (i.e. val > 0 for a column val) whereas a table constraint says something about more than one column, or about the overall table (i.e. min < max).

With this in mind, it might be clearer to write your syntax as:

CREATE TABLE products_1 (
    product_no integer,
    name text,
    price numeric CONSTRAINT positive_price CHECK (price > 0) -- column constraint
);

CREATE TABLE products_2 (
    product_no integer,
    name text,
    price numeric,
    CONSTRAINT positive_price CHECK (price > 0) -- table constraint
);

If all constraints appear normalized within the Query structure, and there is really no way to tell the difference between them, then I think we should just emit all constraints as table constraints. But column/table constraints appears as distinct entities in the query tree, we should emit column constraints next to their columns and table constraints as separate in the list.

jasonmp85 commented 9 years ago

To quote the documentation:

There are two ways to define constraints: table constraints and column constraints. A column constraint is defined as part of a column definition. A table constraint definition is not tied to a particular column, and it can encompass more than one column. Every column constraint can also be written as a table constraint; a column constraint is only a notational convenience for use when the constraint only affects one column.

Based on this, it sounds like we should find all constraints, and emit them all as table constraints. But the bug I found transforms a table constraint into a column constraint, which is potentially problematic, especially considering that the documentation shows a slightly different grammar for column and table constraints in their BNFish syntax definition.

onderkalaci commented 9 years ago

@jasonmp85 After our chat, I understand the bug. Yes, as you suggest, we already find all check constraints(column and table) in the table and convert them to table constraints. The following quote from the PostgreSQL documentation also reinforces our approach:

We say that the first two constraints are column constraints, whereas the third one is a table constraint because it is written separately from any one column definition. Column constraints can also be written as table constraints, while the reverse is not necessarily possible, since a column constraint is supposed to refer to only the column it is attached to. (PostgreSQL doesn't enforce that rule, but you should follow it if you want your table definitions to work with other database systems.)

This commit fixes DeparseCreateStmt function, in which we fail to add a comma for the first table constraint that is deparsed. Since we forget the first comma after column definitions, the first table constraint was considered as column constraint.

Basically, I tried to use the same approach in pg_shard_get_tableschemadef_string, which works as expected.