citusdata / citus

Distributed PostgreSQL as an extension
https://www.citusdata.com
GNU Affero General Public License v3.0
10.65k stars 670 forks source link

Citus switches to sequential execution unnecessarily #4138

Open onurctirtir opened 4 years ago

onurctirtir commented 4 years ago
CREATE TABLE reference_table1(a int unique, b int);
CREATE TABLE reference_table2(a int unique, b int);
SELECT create_reference_table('reference_table1');
SELECT create_reference_table('reference_table2');
ALTER TABLE reference_table1 ADD CONSTRAINT fkey_ref_to_ref FOREIGN KEY(a) REFERENCES reference_table2(a) ON DELETE RESTRICT;

set client_min_messages to DEBUG1;
BEGIN; 
INSERT INTO reference_table2 VALUES (1);
DEBUG:  switching to sequential query execution mode
DETAIL:  Reference table "reference_table2" is modified, which might lead to data inconsistencies or distributed deadlocks via parallel accesses to hash distributed tables due to foreign keys. Any parallel modification to those hash distributed tables in the same transaction can only be executed in sequential query execution mode

This is the case even if no distributed tables are referencing to reference_table2 (directly or transitively). It's probably because this check doesn't consider if at least one of the referencing relations are distributed.

Even more, we might error out if we first access to an unrelated (not referenced/referencing) distributed table:

CREATE TABLE distributed_table(a int, b int);
SELECT create_distributed_table('distributed_table', 'a');

BEGIN;
SELECT * FROM distributed_table;
INSERT INTO reference_table2 values (1);
ERROR:  cannot modify reference table "reference_table2" because there was a parallel operation on a distributed table
DETAIL:  When there is a foreign key to a reference table, Citus needs to perform all operations over a single connection per node to ensure consistency.
HINT:  Try re-running the transaction with "SET LOCAL citus.multi_shard_modify_mode TO 'sequential';"
onderkalaci commented 4 years ago

I guess this doesn't come up as switching to sequential mode is seamless to the user, but could likely to lead poor performance in some cases that could have been avoided.