citusdata / citus

Distributed PostgreSQL as an extension
https://www.citusdata.com
GNU Affero General Public License v3.0
10.46k stars 663 forks source link

citus_copy_shard_placement Locks COPY processes to other tables #5328

Closed palaevren closed 3 years ago

palaevren commented 3 years ago

Hi,

I have a distributed table (which is also partitioned by daily range), let's say table_a. Replication factor is 2 for the table.

I recently realized that shardstate of a placement is 3, as well as its all daily partitions (table_a_p2021_09_26_XXX, table_a_p2021_09_27_XXX, etc.).

In order to repair the corrupted shard from its healty replica, I run citus_copy_shard_placement function:

citus_copy_shard_placement(XXX, 'node1', 5432, 'node2', 5432);

It takes a while as the shard contains many rows. However, as long as the function runs, COPY commands loading data to other tables get locked.

When I monitor blocked and blocking processes as described here, I see results like below:

blocking statement: select_copy_shard_placement($1, $2, $3, $4, $5); blocked statement: COPY table_b (...) FROM .... .....

blocking statement: select_copy_shard_placement($1, $2, $3, $4, $5); blocked statement: COPY table_c (...) FROM .... ...

onderkalaci commented 3 years ago

Hi,

This is the expected behavior. See http://docs.citusdata.com/en/v10.2/admin_guide/cluster_management.html?highlight=rebalance#rebalance-shards-without-downtime for the non-blocking rebalancer you need to use Azure Hyperscale(Citus) or Citus enterprise.