cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.1k stars 3.81k forks source link

sql: UPDATE with ORDER BY ... LIMIT clause results in confusing error #98661

Open rytaft opened 1 year ago

rytaft commented 1 year ago

Describe the problem

When I execute an UPDATE statement with an ORDER BY ... LIMIT clause, it results in a confusing error: "ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions".

To Reproduce

On a single-node local cluster, run:

CREATE TABLE employees (
    emp_no     INT8 NOT NULL,
    birth_date DATE NOT NULL,
    first_name VARCHAR(14) NOT NULL,
    last_name  VARCHAR(16) NOT NULL,
    gender     CHAR NOT NULL,
    hire_date  DATE NOT NULL,
    PRIMARY KEY (emp_no)
);

CREATE TABLE titles (
    emp_no    INT8 NOT NULL,
    title     VARCHAR(50) NOT NULL,
    from_date DATE NOT NULL,
    to_date   DATE,
    FOREIGN KEY (emp_no) REFERENCES employees (emp_no) ON DELETE CASCADE,
    PRIMARY KEY (emp_no, title, from_date)
);

CREATE TABLE salaries (
    emp_no    INT8 NOT NULL,
    salary    INT8 NOT NULL,
    from_date DATE NOT NULL,
    to_date   DATE NOT NULL,
    FOREIGN KEY (emp_no) REFERENCES employees (emp_no) ON DELETE CASCADE,
    PRIMARY KEY (emp_no, from_date)
);

CREATE TABLE salary_audit (
    audit_date DATE NOT NULL,
    emp_no     INT8 NOT NULL,
    sal_date   DATE NOT NULL,
    title      VARCHAR(50) NOT NULL,
    title_date DATE NOT NULL,
    valid      BOOL NOT NULL,
    FOREIGN KEY (emp_no, sal_date) REFERENCES salaries (emp_no, from_date),
    FOREIGN KEY (emp_no, title, title_date) REFERENCES titles (emp_no, title, from_date),
    FOREIGN KEY (emp_no) REFERENCES employees (emp_no),
    PRIMARY KEY (audit_date, emp_no, sal_date)
);

INSERT INTO employees
  SELECT 
    generate_series(0, 100), 
    (now() - trunc(random() * 365)::int * '1 day'::interval - 20 * '1 year'::interval - trunc(random() * 60)::int * '1 year'::interval)::date, 
    f_random_text(10), 
    f_random_text(10), 
    f_random_gender(), 
    (now() - trunc(random() * 365)::int * '1 day'::interval - trunc(random() * 10)::int * '1 year'::interval)::date;

INSERT INTO titles
  SELECT 
    generate_series(0, 100),
    f_random_text(20),
    (now() - trunc(random() * 365)::int * '1 day'::interval - trunc(random() * 5)::int * '1 year'::interval)::date,
    (now() - trunc(random() * 365)::int * '1 day'::interval - trunc(random() * 5)::int * '1 year'::interval)::date;

INSERT INTO salaries 
  SELECT 
    generate_series(0, 100),
    random() * 200000 + 50000,
    (now() - trunc(random() * 365)::int * '1 day'::interval - trunc(random() * 5)::int * '1 year'::interval)::date,
    (now() - trunc(random() * 365)::int * '1 day'::interval - trunc(random() * 5)::int * '1 year'::interval)::date;

INSERT INTO salary_audit
  SELECT * FROM (SELECT
    (now() - trunc(random() * 365)::int * '1 day'::interval)::date,
    s.emp_no,
    s.from_date,
    t.title,
    t.from_date,
    't'::bool
  FROM salaries s, titles t
  WHERE s.emp_no = t.emp_no
  ORDER BY random()
  LIMIT 100);

UPDATE salary_audit AS a SET 
    sal_date = s.from_date,
    title = t.title,
    title_date = t.from_date
  FROM salaries s, titles t
  WHERE s.emp_no = t.emp_no
  AND a.emp_no = s.emp_no
  ORDER BY random() 
  LIMIT 100;

The last statement fails with error:

ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
SQLSTATE: 42P10

This is confusing, since there is no DISTINCT ON expression in the UPDATE statement (the optimizer adds one, but the user doesn't know that...).

Expected behavior

The returned error should be less confusing. Postgres seems to be more restrictive, and doesn't even allow the ORDER BY or LIMIT clauses in this context. Maybe we should do the same.

Here is Postgres' error:

ERROR:  syntax error at or near "ORDER"
LINE 8:   ORDER BY random()
          ^

Environment:

Jira issue: CRDB-25427

rytaft commented 1 year ago

Here is the supported syntax in Postgres:

postgres=# \h UPDATE
Command:     UPDATE
Description: update rows of a table
Syntax:
[ WITH [ RECURSIVE ] with_query [, ...] ]
UPDATE [ ONLY ] table_name [ * ] [ [ AS ] alias ]
    SET { column_name = { expression | DEFAULT } |
          ( column_name [, ...] ) = [ ROW ] ( { expression | DEFAULT } [, ...] ) |
          ( column_name [, ...] ) = ( sub-SELECT )
        } [, ...]
    [ FROM from_item [, ...] ]
    [ WHERE condition | WHERE CURRENT OF cursor_name ]
    [ RETURNING * | output_expression [ [ AS ] output_name ] [, ...] ]

URL: https://www.postgresql.org/docs/14/sql-update.html