2ndQuadrant / pglogical

Logical Replication extension for PostgreSQL 15, 14, 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgrades.
http://2ndquadrant.com/en/resources/pglogical/
Other
987 stars 153 forks source link

pglogical.wait_slot_confirm_lsn docs are incorrect #481

Open jcoleman opened 1 month ago

jcoleman commented 1 month ago

The docs for pglogical.wait_slot_confirm_lsn say:

Wait until all replication slots on the current node have replayed up to the xlog insert position at time of call on all providers. Returns when all slots' confirmed_flush_lsn passes the pg_current_wal_insert_lsn() at time of call.

but looking at the code that's not what actually happens:

    if (PG_ARGISNULL(1))
    {
        if (XLogRecPtrIsInvalid(XactLastCommitEnd))
            target_lsn = GetXLogInsertRecPtr();
        else
            target_lsn = XactLastCommitEnd;
    }

Contra the docs we use the LSN for the last commit performed on the current backend (if one is available). That means that it's possible for the following scenario to occur:

  1. Worker A: COMMIT (lsn 123); confirmed_flush_lsn = 0
  2. Worker B: COMMIT (lsn 456); confirmed_flush_lsn = 123
  3. Worker A: wait_slot_confirm_lsn(slot, NULL => turned into last commit at 123); confirmed_flush_lsn = 123
  4. Worker A: wait_slot_confirm_lsn returns
  5. COMMIT (lsn 456) replicates
jcoleman commented 1 month ago

BTW the behavior here diverged from the documented behavior in https://github.com/2ndQuadrant/pglogical/commit/b72a61838b9a752668f70b89d1c56b1bc02de17b which sadly has no context in the commit message other than the fact that this was backported from pglogical3, so presumably there was an intentional reason to change this in v3, but there's no information as to why that's desirable, and the docs certainly weren't changed to match.