dimitri / pgcopydb

Copy a Postgres database to a target Postgres server (pg_dump | pg_restore on steroids)
Other
1.18k stars 78 forks source link

Test Decoding: Migration failed with could not find free replication slot when trying to migrate 8 databases #739

Closed tanyaranjan1995 closed 6 months ago

tanyaranjan1995 commented 6 months ago

Initiate migration for 8 databses, Migration succeded for 7 databases but failed for the 8th database with the error " could not find free replication state slot for replication origin with ID 11" Version: Running pgcopydb version 0.15 from "/usr/bin/pgcopydb"

Repro Steps: Start migration for 8 dbs: Dbname: db1, db2, db3, db4, db5, db6, db7, db8 Schema of all dbs:

SELECT format('CREATE TABLE ternary_%s (table_key serial primary key,a int NOT NULL, b text NOT NULL, c float);', gs) FROM generate_series(1,10) gs; \gexec
SELECT format('INSERT INTO ternary_%s(a,b,c) SELECT i, md5(i::text), log(i) FROM generate_series(1, 50, 1) AS i;', gs,gs) FROM generate_series(1,10) gs; \gexec

For each dbX (Attached entire Logs in the end):

export PGCOPYDB_SOURCE_PGURI="postgres://user@host:port/dbX"
export PGCOPYDB_TARGET_PGURI="postgres://user@host:port/dbX"
export PGCOPYDB_DROP_IF_EXISTS=on
pgcopydb clone --follow --drop-if-exists --slot-name pgcopydbrslotX --origin pgcopydbroriginX --plugin test_decoding --table-jobs 4 --index-jobs 4 --dir /tmp/pgcopydbX

For db8: Command: pgcopydb clone --follow --drop-if-exists --slot-name pgcopydbrslot8 --origin pgcopydbrorigin8 --plugin test_decoding --table-jobs 4 --index-jobs 4 --dir /tmp/pgcopydb8

Logs:

15:41:17.529 21696 INFO   Running pgcopydb version 0.15 from "/usr/bin/pgcopydb"
15:41:17.534 21696 INFO   [SOURCE] Copying database from "postgres://user@host:port/db8?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60"
15:41:17.534 21696 INFO   [TARGET] Copying database into "postgres://user@host:port/db8?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60"
15:41:17.723 21696 INFO   Using work dir "/tmp/pgcopydb8"
15:41:18.894 21696 INFO   Created logical replication slot "pgcopydbrslot8" with plugin "test_decoding" at 9/BC025600 and exported snapshot 00000011-0000001D-1
15:41:21.516 21696 ERROR  [TARGET 8723] ERROR:  could not find free replication state slot for replication origin with ID 11
15:41:21.517 21696 ERROR  [TARGET 8723] HINT:  Increase max_replication_slots and try again.
15:41:21.517 21696 ERROR  [TARGET 8723] ERROR:  could not find free replication state slot for replication origin with ID 11
15:41:21.517 21696 ERROR  [TARGET 8723] SQL query: select pg_replication_origin_advance($1, $2)
15:41:21.517 21696 ERROR  [TARGET 8723] SQL params: 'pgcopydbrorigin8', '9/BC025600'
15:41:21.517 21696 ERROR  Failed to advance replication origin for "pgcopydbrorigin8" at LSN 9/BC025600

Source replication slot: image image

Complete Migration Logs: db1: pgcopydblogs_db1.txt

db2: pgcopydblogs_db2.txt

db3: pgcopydblogs_db3.txt

db4: pgcopydblogs_db4.txt

db5: pgcopydblogs_db5.txt

db6: pgcopydblogs_db6.txt

db7: pgcopydblogs_db7.txt

db8: pgcopydblogs_db8.txt

dimitri commented 6 months ago

Hi @tanyaranjan1995 ;

Thanks for a detailed bug report. What's interesting is that the problem happens on the target database where it's using 11 replication origin entries. Is it possible that you did several test sessions and some clean-up has failed to happen before hand, or maybe that the same target database server is being used in the context of other source databases at the same time?

15:41:21.516 21696 ERROR [TARGET 8723] ERROR: could not find free replication state slot for replication origin with ID 11 15:41:21.517 21696 ERROR [TARGET 8723] HINT: Increase max_replication_slots and try again. 15:41:21.517 21696 ERROR [TARGET 8723] ERROR: could not find free replication state slot for replication origin with ID 11 15:41:21.517 21696 ERROR [TARGET 8723] SQL query: select pg_replication_origin_advance($1, $2) 15:41:21.517 21696 ERROR [TARGET 8723] SQL params: 'pgcopydbrorigin8', '9/BC025600' 15:41:21.517 21696 ERROR Failed to advance replication origin for "pgcopydbrorigin8" at LSN 9/BC025600

tanyaranjan1995 commented 6 months ago

Thankyou Dimitri. Tried migration by increasing the replication slot on target. It worked.