dimitri / pgcopydb

Copy a Postgres database to a target Postgres server (pg_dump | pg_restore on steroids)
Other
1.15k stars 75 forks source link

follow option not ending #643

Open andres-chavez-bi opened 8 months ago

andres-chavez-bi commented 8 months ago

Hello, I have been testing out the tool and have found that the follow option is not ending, even when the ednpos sentinel is set. Maybe is a configuration I'm not contemplating or function I do not understand fully. Here's my full command. It includes options like restart and drop-if-exists since I've already tried different things and want the process to be "as new" as possible. pgcopydb clone --follow --source ${PGCOPYDB_SOURCE_PGURI} --target ${PGCOPYDB_TARGET_PGURI} --restart --verbose --skip-vacuum --split-tables-larger-than 100MB --restore-jobs 2 --not-consistent --drop-if-exists --index-jobs 2

Once I see the copy summary, I set the sentinel to end the change following.

                                               Step   Connection    Duration    Transfer   Concurrency
 --------------------------------------------------   ----------  ----------  ----------  ------------
   Catalog Queries (table ordering, filtering, etc)       source         15s                         1
                                        Dump Schema       source       6s893                         1
                                     Prepare Schema       target         29s                         1
      COPY, INDEX, CONSTRAINTS, VACUUM (wall clock)         both      10m49s                        10
                                  COPY (cumulative)         both      42m18s       58 GB             4
                          CREATE INDEX (cumulative)       target       2m24s                         2
                           CONSTRAINTS (cumulative)       target         11s                         2
                                VACUUM (cumulative)       target         0ms                         4
                                    Reset Sequences         both       309ms                         1
                         Large Objects (cumulative)       (null)         0ms                         0
                                    Finalize Schema         both       2s554                         2
 --------------------------------------------------   ----------  ----------  ----------  ------------
                          Total Wall Clock Duration         both      11m43s                        16

Here's the command to set the sentinel: pgcopydb stream sentinel set endpos --current

While this seems like it works and the process confirms setting the endpos, it just keeps posting messages like this like if changes are still being followed.

13:49:42.346 20971 INFO pgcopydb sentinel endpos has been set to 5/DE024228

Reported write_lsn 5/DE024228, flush_lsn 5/DE024228, replay_lsn 5/DE024228

I might be missing something, I've not found anyhthing in the documentation that indicates that the follow process has to end by its own or it has to be killed somehow.

My system is an Ubuntu WSL2 system - Linux WKS0001900845 5.15.133.1-microsoft-standard-WSL2 x86_64 GNU/Linux

Postgres version is 15 on both source and target databases. The only way I've managed to stop the follow process is by stopping the source database, and that throws correctly the following error:

14:17:35.071 20212 INFO   Reported write_lsn 5/DE024228, flush_lsn 5/DE024228, replay_lsn 5/DE024228
14:17:45.081 20212 INFO   Reported write_lsn 5/DE024228, flush_lsn 5/DE024228, replay_lsn 5/DE024228
14:17:55.091 20212 INFO   Reported write_lsn 5/DE024228, flush_lsn 5/DE024228, replay_lsn 5/DE024228
14:18:05.100 20212 INFO   Reported write_lsn 5/DE024228, flush_lsn 5/DE024228, replay_lsn 5/DE024228
14:18:15.109 20212 INFO   Reported write_lsn 5/DE024228, flush_lsn 5/DE024228, replay_lsn 5/DE024228
14:18:24.089 20212 ERROR  Failed to process Postgres notifications: SSL connection has been closed unexpectedly
14:18:24.089 20212 ERROR  [Postgres] SSL connection has been closed unexpectedly
14:18:24.089 20212 ERROR  [Postgres] invalid socket
14:18:24.089 20212 INFO   Streamed up to write_lsn 5/DE024228, flush_lsn 5/DE024228, stopping: endpos is 5/DE024228
14:18:24.089 20212 INFO   Prefetch process has terminated
14:18:24.089 20213 NOTICE Transformed 0 messages and 1 transactions
14:18:24.090 20213 INFO   Transform process has terminated
14:18:24.177 19170 NOTICE Subprocess prefetch with pid 20212 has exited successfully
14:18:24.177 19170 INFO   Current sentinel replay_lsn is 5/DE024228, endpos has now been set to 5/DE024228
14:18:24.186 19170 NOTICE Subprocess transform with pid 20213 has exited successfully
14:18:24.518 20214 NOTICE Replay process stopping
14:18:24.519 20214 INFO   Apply process has terminated
14:18:24.650 19170 NOTICE Subprocess catchup with pid 20214 has exited successfully
14:18:24.800 19170 INFO   Subprocesses for prefetch, transform, and catchup have now all exited
14:18:24.801 19170 INFO   Current sentinel replay_lsn is 5/DE024228, endpos is 5/DE024228
14:18:24.801 19170 INFO   Current endpos 5/DE024228 has been reached at 5/DE024228
14:18:24.801 19170 INFO   Follow mode is now done, reached replay_lsn 5/DE024228 with endpos 5/DE024228
14:18:24.986 19164 INFO   A previous run has run through completion
14:18:24.986 19164 NOTICE Schema dump for pre-data and post-data have been done
14:18:24.986 19164 NOTICE Pre-data schema has been restored on the target instance
14:18:24.986 19164 NOTICE Table Data has been copied to the target instance
14:18:24.986 19164 NOTICE Indexes have been copied to the target instance
14:18:24.986 19164 NOTICE Sequences have been copied to the target instance
14:18:24.986 19164 NOTICE Large Objects have been copied to the target instance
14:18:24.986 19164 NOTICE Post-data schema has been restored on the target instance
14:18:24.987 19164 INFO   Re-using catalog caches
14:18:24.987 19164 NOTICE Now starting setval process 19164 [15280]
14:18:24.987 19164 INFO   Skipping sequences, already done on a previous run

I'm attaching the full log output from the process. pgcopydb_log.txt

dimitri commented 7 months ago

Hi @andres-chavez-bi ; I think this happens when there is no change happening on the source database anymore. Can you confirm it is also the issue you have?

andres-chavez-bi commented 7 months ago

Hi! yes, that's what happens.