citusdata / citus

Distributed PostgreSQL as an extension
https://www.citusdata.com
GNU Affero General Public License v3.0
10.43k stars 662 forks source link

Nested multi-shard commands might fail #4076

Open onderkalaci opened 4 years ago

onderkalaci commented 4 years ago

While working on #4034, we considered cases where a COPY command might nest another call other COPY commands (or any other command basically). I traced back to 9.2, and I can reproduce until there.

It turns out we have some other issues around it:

CREATE TABLE colocated_test_table (a int);
SELECT create_distributed_table('colocated_test_table', 'a');
INSERT INTO colocated_test_table SELECT * FROM generate_series(0,100)i;

CREATE OR REPLACE FUNCTION distributed_insert_select(out result bigint)
    AS $$ WITH cte_1  AS (INSERT INTO colocated_test_table SELECT  *  FROM colocated_test_table LIMIT 100 OFFSET 0 RETURNING *) SELECT count(*) FROM  cte_1;$$
    LANGUAGE SQL;

-- MX needed to allow create_distributed_function succeed
SELECT start_metadata_sync_to_node(nodename, nodeport) FROM pg_dist_node;
SELECT create_distributed_function('distributed_insert_select()');

CREATE TABLE nested_copy_test (key int, value int DEFAULT distributed_insert_select());
SELECT create_distributed_table('nested_copy_test', 'key');

COPY nested_copy_test(key) FROM PROGRAM 'seq 32';
ERROR:  connection to the remote node localhost:9701 failed with the following error: another command is already in progress
CONTEXT:  SQL function "distributed_insert_select" statement 1
COPY nested_copy_test, line 2: "2"

INSERT INTO nested_copy_test (key) SELECT * FROM generate_series(0,10);
ERROR:  connection to the remote node localhost:9701 failed with the following error: another command is already in progress
CONTEXT:  SQL function "distributed_insert_select" statement 1

-- single command operations seems to work
INSERT INTO nested_copy_test (key) VALUES (1);
INSERT 0 1
onderkalaci commented 4 years ago

Once we fix it, we should test in the context of #4034 as well

onderkalaci commented 4 years ago

Another example Marco noted which involves reference tables


CREATE OR REPLACE FUNCTION public.mean()
RETURNS int
LANGUAGE sql
AS $$
        INSERT INTO test SELECT * FROM ref ORDER BY random() LIMIT 1 RETURNING x;
$$;

CREATE TABLE evil (m int default mean(), d int);
SELECT create_distributed_table('evil', 'm');

BEGIN;
INSERT INTO evil (d) SELECT s FROM generate_series(1,100) s;
INSERT INTO evil (d) SELECT s FROM generate_series(1,100) s;
END;