citusdata / citus

Distributed PostgreSQL as an extension
https://www.citusdata.com
GNU Affero General Public License v3.0
10.57k stars 670 forks source link

Failing 2PC COPY with 1PC master_modify_multiple_shards() leads to inconsistent data #925

Closed metdos closed 2 years ago

metdos commented 8 years ago

If a 2PC Copy fails during COMMIT PREPARED phase, we leave some uncommitted PREPARE TRANSACTIONs as expected, but still allow modifications on the table. This causes data inconsistencies after PREPARED TRANSACTIONs are manually committed.

How to replicate;

Make 2PC Copy fails during COMMIT PREPARED phase by using a breakpoint at commit stage and killing the worker node which didn't get commits yet.

SET citus.multi_shard_commit_protocol TO '2pc';

postgres=# COPY lineitem from '/Users/metindoslu/Development/tpch_2_13_0/lineitem.tbl' WITH DELIMITER '|';

WARNING:  failed to commit prepared transaction 'citus_4324_2_102012'
HINT:  Run "COMMIT PREPARED 'citus_4324_2_102012'" on localhost:9700
WARNING:  failed to commit prepared transaction 'citus_4324_2_102013'
HINT:  Run "COMMIT PREPARED 'citus_4324_2_102013'" on localhost:9700
COPY 6001215

Restart the failing worker node. Switch to 1PC and run delete command inside master_modify_multiple_shards()

SET citus.multi_shard_commit_protocol TO '1pc';

postgres=# select master_modify_multiple_shards('delete from lineitem');

WARNING:  modified 0 tuples of shard 102012, but expected to modify 2997003
DETAIL:  modified placement on localhost:9700
WARNING:  modified 3004212 tuples of shard 102013, but expected to modify 0
DETAIL:  modified placement on localhost:9701
 master_modify_multiple_shards
-------------------------------
                       2997003
(1 row)

Commit waiting 2PC transactions on the worker node.

postgres=# COMMIT PREPARED 'citus_4324_2_102012';
COMMIT PREPARED
postgres=# COMMIT PREPARED 'citus_4324_2_102013';
COMMIT PREPARED

Now we have same shards with different row counts. Running master_modify_multiple_shards() shows it.

postgres=# select master_modify_multiple_shards('delete from lineitem');

WARNING:  modified 2997003 tuples of shard 102012, but expected to modify 0
DETAIL:  modified placement on localhost:9700
WARNING:  modified 0 tuples of shard 102013, but expected to modify 3004212
DETAIL:  modified placement on localhost:9701
 master_modify_multiple_shards
-------------------------------
                       3004212
(1 row)
sumedhpathak commented 8 years ago

@marcocitus my impression was that pending prepare transactions will hold on to those write locks, causing other write commands to block. Is this something we need to handle on the Citus side instead of Postgres?

anarazel commented 8 years ago

@sumedhpathak Locks are indeed retained. But in this case there's no lock conflict because the deleting transaction didn't see the to-be-deleted rows, hence didn't try to delete them. Normally the resource locks on the master prevent that from being an issue, but those are released after the failed commit prepared ...;.

anarazel commented 8 years ago

(just as an aside: with serializable mode this would have resulted in a serialization failure)