citusdata / citus

Distributed PostgreSQL as an extension
https://www.citusdata.com
GNU Affero General Public License v3.0
10.63k stars 670 forks source link

Add FREEZE Option to COPY Command and Explicit Transaction Management for Shard Moves #7687

Open m3hm3t opened 2 months ago

m3hm3t commented 2 months ago

Here's a suggested description for your PR based on the issue:


Description

This PR enhances shard moves and splits by adding the FREEZE option to the COPY command, reducing the need for post-operation vacuuming. To achieve this:

  1. Added the FREEZE option to ConstructShardCopyStatement and LocalCopyToShard.
  2. Implemented transaction management around the COPY operation.
  3. Added a truncation step before COPY in ConnectToRemoteAndStartCopy to ensure the operation is within the same transaction.
  4. Updated the transaction to close in ShardCopyDestReceiverShutdown.
codecov[bot] commented 2 months ago

Codecov Report

Attention: Patch coverage is 48.71795% with 20 lines in your changes missing coverage. Please review.

Project coverage is 88.98%. Comparing base (4775715) to head (c958e8e). Report is 2 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #7687 +/- ## ========================================== - Coverage 89.70% 88.98% -0.72% ========================================== Files 283 283 Lines 60510 60539 +29 Branches 7541 7547 +6 ========================================== - Hits 54278 53871 -407 - Misses 4078 4449 +371 - Partials 2154 2219 +65 ```
m3hm3t commented 1 month ago

Test commands:

\c - - - :worker_1_port
SET search_path TO worker_copy_table_to_node;

\echo :worker_1_port
57637
\echo :worker_1_node
17
SELECT nodeid 
FROM pg_dist_node 
WHERE nodeport=57637;  -- Assuming 57637 is the desired port
 nodeid 
--------
     17
(1 row)

SELECT shardid, shardstate, nodename, nodeport
FROM pg_dist_shard_placement
JOIN pg_dist_shard USING (shardid)
WHERE shardid = 62629600;
 shardid  | shardstate | nodename  | nodeport 
----------+------------+-----------+----------
 62629600 |          1 | localhost |    57637
(1 row)

SELECT worker_copy_table_to_node('t_62629600', :worker_1_node);

Postgresql logs:

2024-09-24 10:07:35.780 UTC [23740] LOG:  00000: CanUseLocalCopy: localNodeId = 28, destinationNodeId = 17, canUseLocalCopy = false
2024-09-24 10:07:35.780 UTC [23740] LOCATION:  CanUseLocalCopy, worker_shard_copy.c:96
2024-09-24 10:07:35.780 UTC [23740] STATEMENT:  SELECT worker_copy_table_to_node('t_62629600', 17);
2024-09-24 10:07:35.780 UTC [23740] LOG:  00000: ShardCopyDestReceiverReceive: Initiating remote copy since tuplesSent = 0 and useLocalCopy = false.
2024-09-24 10:07:35.780 UTC [23740] LOCATION:  ShardCopyDestReceiverReceive, worker_shard_copy.c:227
2024-09-24 10:07:35.780 UTC [23740] STATEMENT:  SELECT worker_copy_table_to_node('t_62629600', 17);
select * from pg_dist_node;
 nodeid | groupid | nodename  | nodeport | noderack | hasmetadata | isactive |  noderole   |                           nodecluster                           | metadatasynced | shouldhaveshards 
--------+---------+-----------+----------+----------+-------------+----------+-------------+-----------------------------------------------------------------+----------------+------------------
     27 |      14 | localhost |     8887 | default  | f           | t        | secondary   | thisisasixtyfourcharacterstringrepeatedfourtimestomake256chars. | f              | t
     26 |      14 | localhost |     8888 | default  | f           | t        | secondary   | olap                                                            | f              | t
     30 |      14 | localhost |     9992 | default  | f           | t        | secondary   | second-cluster                                                  | f              | t
     29 |      16 | localhost |     9994 | default  | f           | t        | secondary   | default                                                         | f              | t
     28 |      14 | localhost |     9995 | default  | f           | t        | secondary   | default                                                         | f              | t
     22 |      16 | localhost |     9996 | default  | f           | f        | secondary   | default                                                         | f              | t
     21 |      14 | localhost |     9997 | default  | f           | t        | unavailable | default                                                         | f              | t
     20 |      14 | localhost |     9998 | default  | f           | t        | secondary   | default                                                         | f              | t
     16 |       0 | localhost |    57636 | default  | t           | t        | primary     | default                                                         | t              | f
     19 |      16 | localhost |    57638 | default  | t           | t        | primary     | default                                                         | t              | t
     17 |      14 | localhost |    57637 | default  | t           | t        | primary     | default                                                         | t              | t
(11 rows)