2ndQuadrant / pglogical

Logical Replication extension for PostgreSQL 15, 14, 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgrades.
http://2ndquadrant.com/en/resources/pglogical/
Other
987 stars 153 forks source link

Bug: Race condition causes data loss in pglogical_create_subscriber #468

Open ruoshan opened 5 months ago

ruoshan commented 5 months ago

When using pglogical_create_subscriber to create a new logical subsrciber, there is a race condition between the two processes updating/reading the sync_status field of pglogical.local_sync_status. The two processes are:

There is a tiny time window for the "pglogical apply worker" to start its work using the wrong state. The time windows is at these lines in the pglogical_subscribe function of pglogcial_create_subscriber.c .

When the first query in the referred lines executed, and kernel switch out the pglogical_create_subscriber process for too long. The "pglogical apply worker" will run the pglogical_sync_subscription function with sync_status == SYNC_STATUS_INIT . Started pglogical_sync_subscription with INIT status is not OK, as it will re-create the replication slot and use the new snapshot's LSN as the logical replication origin.

I create a small patch to make the bug very easy to reproduce on a system that has data coming in during the logical replication creation. Here is the patch:

diff --git a/pglogical_create_subscriber.c b/pglogical_create_subscriber.c
index 509e165..e5b6552 100644
--- a/pglogical_create_subscriber.c
+++ b/pglogical_create_subscriber.c
@@ -1108,6 +1108,9 @@ pglogical_subscribe(PGconn *conn, char *subscriber_name, char *subscriber_dsn,
    }
    PQclear(res);

+    printf("========> Insert some new data on primary now (in the next 30 seconds), these data will be lost in subscriber\n");
+    sleep(30);
+
    resetPQExpBuffer(&query);
    initPQExpBuffer(&repsets);

@@ -1133,6 +1136,12 @@ pglogical_subscribe(PGconn *conn, char *subscriber_name, char *subscriber_dsn,
    }
    PQclear(res);

+    // This sleep makes the race condition easy to reproduce
+    //
+    // This sleep will allow the subscription worker process to start first with sync_status == 'i',
+    // before the local_sync_status.sync_status field is set to 'r'.
+    sleep(30);
+
    /* TODO */
    res = PQexec(conn, "UPDATE pglogical.local_sync_status SET sync_status = 'r'");
    if (PQresultStatus(res) != PGRES_COMMAND_OK)

(the above patch is under MIT license)

mochja commented 2 weeks ago

I was able to get around this issue by creating the node, node_interface, subscription, and local_sync_status resources manually and not using the pglogical_create_subscriber function. Then creating replication origin; replication slot on the remote node. I can then start the replication using alter_subscription_enable function and it re-uses the existing replication slot.

Is there any other step that I miss or downside to use the above approach, for pglogical to properly work? I do not notice anything unusual ATM.

ruoshan commented 1 week ago

I think putting the pglogical.create_subscription() and set sync_status = 'r' in the same transaction should be enough to fix this.