cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.09k stars 3.8k forks source link

udf: support late-binding user-defined functions #89109

Open michae2 opened 2 years ago

michae2 commented 2 years ago

(This was discussed at length in the UDF RFC pull request. Making an issue so that we don't forget.)

PostgreSQL has two different forms of LANGUAGE SQL UDFs:

These two forms are not only syntactically different, but also semantically different. The older string-based form is late-binding, resolving references at execution time. The newer sql_body form is early-binding, resolving references at function creation time, and using reference tracking to disallow alters and drops that would invalidate references.

Initially we are only supporting the older string-based syntax, but using the newer sql_body early-binding semantics. We believe this will be compatible with most applications while also protecting against accidental function breaks (or in the worst cases, corruption). The reference tracking is similar to what we use for views.

But this does mean some DDL that is possible on PostgreSQL with late-binding UDFs is not possible with our early-binding UDFs. For example, replacing a column used by a UDF:

CREATE TABLE abc (a int PRIMARY KEY, b int, c int);
INSERT INTO abc VALUES (1, 2, 3);
CREATE FUNCTION foo (i int) RETURNS int STABLE LANGUAGE SQL AS $$ SELECT b FROM abc WHERE a = i $$;
SELECT foo(1);
ALTER TABLE abc DROP COLUMN b, ADD COLUMN b int DEFAULT 5;
SELECT foo(1);

On PostgreSQL 14.5 this works, but on CockroachDB 22.2 the ALTER TABLE returns an error:

demo@127.0.0.1:26257/defaultdb> ALTER TABLE abc DROP COLUMN b, ADD COLUMN b int DEFAULT 5;
ERROR: cannot drop column "b" because function "foo" depends on it
SQLSTATE: 2BP01
HINT: you can drop foo instead.

This is similar to what PostgreSQL does when using a sql_body syntax UDF:

michae2=# ALTER TABLE abc DROP COLUMN b, ADD COLUMN b int DEFAULT 5;
2022-09-30 11:24:24.072 PDT [17051] ERROR:  cannot drop column b of table abc because other objects depend on it
2022-09-30 11:24:24.072 PDT [17051] DETAIL:  function foo2(integer) depends on column b of table abc
2022-09-30 11:24:24.072 PDT [17051] HINT:  Use DROP ... CASCADE to drop the dependent objects too.
2022-09-30 11:24:24.072 PDT [17051] STATEMENT:  ALTER TABLE abc DROP COLUMN b, ADD COLUMN b int DEFAULT 5;
ERROR:  cannot drop column b of table abc because other objects depend on it
DETAIL:  function foo2(integer) depends on column b of table abc
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

Continuing the example, replacing the table also works in PostgreSQL:

CREATE TABLE abc_new (a int PRIMARY KEY, b int, c int);
INSERT INTO abc_new VALUES (1, 21, 31);
BEGIN;
ALTER TABLE abc RENAME TO abc_old;
ALTER TABLE abc_new RENAME TO abc;
COMMIT;
SELECT foo(1);

Some applications might expect these DDL statements to be possible when using string-based UDFs. We should provide some kind of escape-hatch to allow these DDL statements, or maybe fully support late-binding UDFs.

Jira issue: CRDB-20121

postamar commented 2 years ago

I would like product's input here. cc @devadvocado

rafiss commented 1 year ago

Is this issue the same as https://github.com/cockroachdb/cockroach/issues/85144? If so, I will close that one, since this has more information.