citusdata / citus

Distributed PostgreSQL as an extension
https://www.citusdata.com
GNU Affero General Public License v3.0
10.57k stars 670 forks source link

Running DDL EXECUTE in function is incorrectly replicated to workers #6608

Open fvannee opened 1 year ago

fvannee commented 1 year ago

Small reproducible example for what we encountered in a more complex case. Reproduces on PG15 + Citus 11.1 with at least one worker (besides the coordinator).

create table t1 (a int, b int);
select create_distributed_table('t1', 'a');

CREATE OR REPLACE FUNCTION exec_func()
 RETURNS void
 LANGUAGE plpgsql
AS $function$
BEGIN
   EXECUTE $$
create index on t1 (a);
create index on t1 (b);
   $$;
END
$function$
;

select exec_func();

Error:

SQL Error [42601]: ERROR: cannot insert multiple commands into a prepared statement
  Where: while executing command on worker:5432
SQL statement "

create index on t1 (a);

create index on t1 (b);

   "
PL/pgSQL function exec_func() line 3 at EXECUTE

Creating one index inside the EXECUTE works fine. It fails when creating multiple ones. Seems like it's trying to replicate things inside EXECUTE in a prepared statement to the workers, but since it contains two statements this fails?

marcocitus commented 1 year ago

This seems to be a general problem with the way some DDL commands reuse the original query string. We should probably stop doing that.

JelteF commented 1 year ago

Ran into this same error when trying to have this in a single setup phase of an isolation test:

  SET citus.shard_count to 1;
  SET citus.shard_replication_factor to 1;
  ALTER SEQUENCE pg_catalog.pg_dist_shardid_seq RESTART 1238760;

  CREATE TABLE t1 (id int PRIMARY KEY, value int);
  SELECT create_distributed_table('t1', 'id');

  CREATE TABLE t2 (id int PRIMARY KEY, value int);
  SELECT create_distributed_table('t2', 'id');
  ALTER TABLE t1 ADD CONSTRAINT t1_t2_fkey FOREIGN KEY (id) REFERENCES t2(id);
TsinghuaLucky912 commented 1 year ago

I think we didn't consider the situation of multiple SQL in execute before, and the default sql command in many deparse is one. This is actually problematic:

39c805a4f19949f00c02715835f82f34

1675072796168_C46ECAFF-8822-4db8-9ABD-721B1049FB6A

fvannee commented 1 year ago

Any news on this issue?

onderkalaci commented 1 year ago

We should probably always deparse the command instead of using the original command. In the end, the same command is always (or almost always) can be deparsed on the shards. So, should be mostly fine, and potentially adding few more missing deparsing logic.

Plus, we can get rid of the CurrentSearchPath trick: https://github.com/citusdata/citus/blob/343d1c5072e652902a91c58ef9538df9a5aa9048/src/backend/distributed/commands/utility_hook.c#L1199

onurctirtir commented 1 year ago

Another example to this:

CREATE TABLE referenced_tbl (
    id int PRIMARY KEY
);
SELECT create_distributed_table('referenced_tbl', 'id');

CREATE TABLE referencing_tbl (
    id int,
    CONSTRAINT fkey FOREIGN KEY (id) REFERENCES referenced_tbl(id)
);
SELECT create_distributed_table('referencing_tbl', 'id');

CREATE FUNCTION drop_fkey()
  RETURNS void
  LANGUAGE plpgsql
AS $function$
BEGIN
  EXECUTE $$
    SET CONSTRAINTS fkey IMMEDIATE;
    ALTER TABLE referencing_tbl DROP CONSTRAINT fkey;
  $$;
END
$function$;

SELECT drop_fkey();
ERROR:  cannot insert multiple commands into a prepared statement
CONTEXT:  while executing command on localhost:9701
SQL statement "
    SET CONSTRAINTS fkey IMMEDIATE;
    ALTER TABLE referencing_tbl DROP CONSTRAINT fkey;
  "
PL/pgSQL function drop_fkey() line 3 at EXECUTE