citusdata / citus

Distributed PostgreSQL as an extension
https://www.citusdata.com
GNU Affero General Public License v3.0
10.43k stars 662 forks source link

While updating a partition, a concurrent DROP TABLE (e.g., partition) or CREATE PARTITION gets into distributed deadlock #5135

Open onderkalaci opened 3 years ago

onderkalaci commented 3 years ago

To easily repro:

cat /tmp/f.sql pgbench -f /tmp/f.sql -c 64 -j 8 -T 1200 -P 1 postgres

update traffic_flows_day_9_20210712 set src_id  = 1;

-- other session

drop table traffic_flows_day_6_20210712;                                                ERROR:  canceling the transaction since it was involved in a distributed deadlock
CONTEXT:  SQL statement "SELECT citus_drop_all_shards(v_obj.objid, v_obj.schema_name, v_obj.object_name)"
PL/pgSQL function citus_drop_trigger() line 16 at PERFORM
Time: 952.391 ms

-- or

CREATE TABLE traffic_flows_day_6_20210712 PARTITION OF traffic_flows_day_6 FOR VALUES FROM ('6', '2021-07-12') TO ('6', '2021-07-13');
ERROR:  canceling the transaction since it was involved in a distributed deadlock
Time: 1356.825 ms (00:01.357)

It might not happen 100% accuracy, but like ~50% on my local machine

onderkalaci commented 3 years ago

Note that setting the pool size to 1 doesn't help

hlakshmi commented 3 years ago

Thanks @onderkalaci. This is happening pretty frequently in our environment as our cluster is very write heavy. Can we please prioritize this?

SaitTalhaNisanci commented 3 years ago

It might not happen 100% accuracy, but like ~50% on my local machine

CREATE TABLE users_table_part(user_id bigint, value_1 int, value_2 int) PARTITION BY RANGE (value_1);
CREATE TABLE users_table_part_0 PARTITION OF users_table_part FOR VALUES FROM (0) TO (1);
CREATE TABLE users_table_part_1 PARTITION OF users_table_part FOR VALUES FROM (1) TO (2);
CREATE TABLE users_table_part_2 PARTITION OF users_table_part FOR VALUES FROM (2) TO (3);
CREATE TABLE users_table_part_3 PARTITION OF users_table_part FOR VALUES FROM (3) TO (4);
CREATE TABLE users_table_part_4 PARTITION OF users_table_part FOR VALUES FROM (4) TO (5);
CREATE TABLE users_table_part_5 PARTITION OF users_table_part FOR VALUES FROM (5) TO (6);
CREATE TABLE users_table_part_6 PARTITION OF users_table_part FOR VALUES FROM (6) TO (7);
CREATE TABLE users_table_part_7 PARTITION OF users_table_part FOR VALUES FROM (7) TO (8);
CREATE TABLE users_table_part_8 PARTITION OF users_table_part FOR VALUES FROM (8) TO (9);
SELECT create_distributed_table('users_table_part', 'user_id');
INSERT INTO users_table_part SELECT i, i %9, i %50 FROM generate_series(0, 100) i;

/tmp/f.sql:

update users_table_part_0 SET value_2 =5;
pgbench -f /tmp/f.sql -c 64 -j 8 -T 1200 -P 1 postgres
 drop table users_table_part_1;
CREATE TABLE users_table_part_1 PARTITION OF users_table_part FOR VALUES
 FROM (1) TO (2);

Can you reproduce with these steps?

SaitTalhaNisanci commented 3 years ago

@hlakshmi Out of curiosity, why don't you have the updates on the parent table?

hlakshmi commented 3 years ago

@SaitTalhaNisanci - One specific reason (which was due to a old legacy code), initially we used to take a shared row lock on the child partition we were inserting to to avoid deadlock errors (PG::TRDeadlockDetected) as we do have multiple processes doing writes.

Now we worked around the deadlock problem by retrying the transaction in case a process runs into a deadlock and got rid of the locks. So we could just do the write on the parent table.

To be specific our query looks something like this: INSERT INTO <child_partition> SELECT * FROM temp_table ON CONFLICT DO .... Hope this make sense.

And thanks for looking into this issue.

SaitTalhaNisanci commented 3 years ago

@hlakshmi if you can do the write on the parent, that'd be more correct I think. Do you get distributed deadlocks when you do that?

hlakshmi commented 3 years ago

@SaitTalhaNisanci Sorry I missed your comment. Yeah we will try the insert on to the parent table and see if we run into any locking issues and get back. Thanks!