citusdata / citus

Distributed PostgreSQL as an extension
https://www.citusdata.com
GNU Affero General Public License v3.0
10.43k stars 662 forks source link

Citus enforcing sequential mode even if there is a foreign key between Citus local tables #6729

Open crabhi opened 1 year ago

crabhi commented 1 year ago

Hi, I've stumbled on a potential bug in interaction between citus-local (probably reference as well) and distributed tables.

Setup

Try this setup (I used v11.2.0):

CREATE TABLE distributed (id SERIAL PRIMARY KEY);
CREATE TABLE local (id SERIAL PRIMARY KEY);
SELECT citus_add_local_table_to_metadata('local');
SELECT create_distributed_table('distributed', 'id');
CREATE TABLE local_dependent (id SERIAL PRIMARY KEY, fk INT REFERENCES local (id));

INSERT INTO distributed SELECT * FROM generate_series(1,10);

There is no relation between the distributed and any of the local tables, only between the local tables themselves.

Problem

Now, if I modify the distributed table and the local table in the same transaction, I'll get an error.

BEGIN;
DELETE FROM distributed;
INSERT INTO local VALUES(DEFAULT);

-- ERROR:  cannot modify table "local" because there was a parallel operation on a distributed table
-- DETAIL:  When there is a foreign key to a reference table or to a local table, Citus needs to perform all operations over a single connection per node to ensure consistency.
-- HINT:  Try re-running the transaction with "SET LOCAL citus.multi_shard_modify_mode TO 'sequential';"

I'd expect this to succeed because the database doesn't ensure any data integrity between the two tables I tried to modify.

Other observations

It's interesting that if the table local_dependent doesn't exist at all, the transaction above succeeds.

naisila commented 1 year ago

Hi,

The error you run into doesn't imply that there is a relation between the distributed and the local tables. So this is not about data integrity between the two tables.

The error is about the parallel connections opened. Because there was a parallel operation on a distributed table, connections were opened, and there is currently the limitation that when there are foreign keys to local tables, Citus needs to perform all operations over a single connection.

It's interesting that if the table local_dependent doesn't exist at all, the transaction above succeeds.

In that case, the foreign key disappears, and we don't have the single connection limitation anymore.

We already cover this limitation in our test suite, with a reference table in this case, but as noted from the error, local and reference tables behave the same way in this limitation: https://github.com/citusdata/citus/blob/main/src/test/regress/sql/foreign_key_restriction_enforcement.sql#L657-L666

-- an unrelated update followed by update on the reference table and update
-- on the cascading distributed table
-- note that the UPDATE on the reference table will try to set the execution
-- mode to sequential, which will fail since there is an already opened
-- parallel connections
BEGIN;
    UPDATE unrelated_dist_table SET value_1 = 15;
    UPDATE reference_table SET id = 101 WHERE id = 99;
    UPDATE on_update_fkey_table SET value_1 = 5 WHERE id != 11;
ROLLBACK;
crabhi commented 1 year ago

Hi, thanks for your response. Yes, that's what I'm writing about.

From my point of view, this is a limitation - in my example above, there is no threat to data consistency so the operations in fact could be performed over parallel connections. That is also evidenced by the fact that the transaction succeeds when you remove a table that doesn't take part in the transaction at all.

Citing from https://www.citusdata.com/updates/v11-0/#managed-local-table:

In general, our design policy for “citus managed local tables” is that they become indistinguishable from Postgres tables.

The operations above succeed if the local tables are not managed by Citus.

naisila commented 1 year ago

Hi, just updated my previous reply with the known limitation in our test suite. Thanks for pointing it out as well. We are working towards this goal, and we have tried to note down the limitations in our test suite as much as possible.

crabhi commented 1 year ago

I think the local and reference tables actually may need a bit different treatment. With reference tables you implicitly expect some limitations due to the need to synchronize them across the nodes.

Most importantly, the act of adding tables to metadata alone brings limitations - some transactions previously succeeding start to fail. ☹️ In the meantime, the data still sit at the same place with the same consistency guarantees as before. I understand you may have other development priorities. I just wanted to point out that this is a real issue that adds burden to a migration from plain Postgres to Citus.

Regarding the test you mentioned, is it something that could be improved in a future release? Or is there some fundamental problem why the unrelated_dist_table couldn't be updated in parallel followed by the update to reference_table?

onderkalaci commented 1 year ago

Probably same as https://github.com/citusdata/citus/issues/4138

crabhi commented 1 year ago

Yes, I think so. I just tried that it behaves the same - if you remove the FK between reference tables, the transaction succeeds.

I think this one can be closed as a duplicate. The additional info from this report could be probably summarized that the problem also affects Citus-local tables so they can't be used in place of true local tables in some cases.

naisila commented 1 year ago

I think the local and reference tables actually may need a bit different treatment. With reference tables you implicitly expect some limitations due to the need to synchronize them across the nodes.

The additional info from this report could be probably summarized that the problem also affects Citus-local tables so they can't be used in place of true local tables in some cases.

Good points, I agree with these. I think we can leave this issue open as it's a more specific concern for citus local tables.

onderkalaci commented 1 year ago

I think we can leave this issue open as it's a more specific concern for citus local tables.

Yes, l also think that let's keep this one as well.