2ndQuadrant / pglogical

Logical Replication extension for PostgreSQL 17, 16, 15, 14, 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgrades.
http://2ndquadrant.com/en/resources/pglogical/
Other
1.01k stars 153 forks source link

PGLOGICAL subscriber failed during nonrecoverable step(s) #84

Closed mxjones121 closed 7 years ago

mxjones121 commented 7 years ago

Hi, I am using pglogical and hitting px_xlog growth as the replication keeps failing. Environment Details: Postgres 9.4.10 on Publisher and Subscriber Pglogical 1.2.2 O/S Publisher Centos 7.1 Subscriber Centos 6.0 Postgis 2.1.8

Publisher set-up set_id | set_nodeid | set_name | replicate_insert | replicate_update | replicate_delete | replicate_truncate ------------+------------+---------------------+------------------+------------------+------------------+-------------------- 2439852409 | 405259706 | default | t | t | t | t 4166519128 | 405259706 | default_insert_only | t | f | f | t 3721228616 | 405259706 | ddl_sql | t | f | f | f 2144801490 | 405259706 | Mapsreplicate | t | t | t | t

set_id | set_reloid
------------+----------------------- 2144801490 | pinhead.changed_incident 2144801490 | pinhead.hydrant 2144801490 | solr."default" 2144801490 | import.hydrants 2144801490 | import.changed_incidents 2144801490 | pinhead.ssri

Setup went well and table data was synced across.

Subscriber set-up sync_kind | sync_subid | sync_nspname | sync_relname | sync_status -----------+------------+--------------+----------------+------------- f | 2823137310 | import | hydrants | r f | 2823137310 | import | changed_incidents | r f | 2823137310 | pinhead | hydrant | r f | 2823137310 | pinhead | changed_incident | r f | 2823137310 | pinhead | ssri | r f | 2823137310 | solr | default | r d | 2823137310 | | | r

sub_id | sub_name | sub_origin | sub_target | sub_origin_if | sub_target_if | sub_enabled | sub_slot_name | sub_replication_sets | sub_forward_origins ------------+------------------+------------+------------+---------------+---------------+-------------+-------------------------------------+----------------------+--------------------- 2823137310 | subscribe_icu999 | 405259706 | 4293708667 | 3303793227 | 3650110810 | t | pgl_changed_maps_ub1_subscribe_icu999 | {Mapsreplicate} | {all}

node_id | node_name
------------+---------------- 4293708667 | subscriber_icu 405259706 | mapsPub1 if_id | if_name | if_nodeid | if_dsn
------------+----------------+------------+------------------------------------------------------------- 3650110810 | subscriber_icu | 4293708667 | host= changed port=5432 dbname=changed 3303793227 | mapsPub1 | 405259706 | host=changed port=5432 dbname=changed password=blah

Questions

1: pg_xlog growth, I assume because the replication is unable to finish. errors in subscriber log (debug3)

< 2017-03-31 09:24:06.346 BST >FATAL: could not send replication command "START_REPLICATION SLOT "pgl_changed_maps_ub1_subscribe_icu999" LOGICAL 0/0 (expected_encoding 'UTF8', min_proto_version '1', max_pr oto_version '1', startup_params_format '1', "binary.want_internal_basetypes" '1', "binary.want_binary_basetypes" '1', "binary.basetypes_major_version" '904', "binary.sizeof_datum" '8', "binary.sizeof_int" '4', "binary.sizeof_long" '8', "binary.bigendian" '0', "binary.float4_byval" '1', "binary.float8_byval" '1', "binary.integer_datetimes" '1', "hooks.setup_function" 'pglogical.pglogical_hooks_setup', "pgl ogical.forward_origins" '"all"', "pglogical.replication_set_names" '"Mapsreplicate"', "relmeta_cache_size" '-1', pg_version '90410', pglogical_version '1.2.2', pglogical_version_num '10202', pglogical_app ly_pid '11043')": ERROR: replication slot "pgl_blah_maps_ub1_subscribe_icu999" is already active

2: More failures, but listed as Error. < 2017-03-31 09:24:22.326 BST >ERROR: subscriber subscription_icu_replication9999 initialization failed during nonrecoverable step (s), please try the setup again < 2017-03-31 09:25:02.348 BST >ERROR: subscriber subscription_icu_replication9999 initialization failed during nonrecoverable step (s), please try the setup again 2017-03-31 09:26:22.372 BST >ERROR: subscriber subscription_icu_replication9999 initialization failed during nonrecoverable step (s), please try the setup again

and on ....

Can you advise how I troubleshoot this please as I see nothing specific to this and pglogical.

.

mxjones121 commented 7 years ago

More detail son error

< 2017-03-31 10:29:03.688 BST >ERROR: subscriber subscription_icu_replication9999 initialization failed during nonrecoverable step (s), please try the setup again < 2017-03-31 10:29:03.688 BST >DEBUG: shmem_exit(1): 2 before_shmem_exit callbacks to make < 2017-03-31 10:29:03.688 BST >LOG: apply worker [13159] at slot 3 generation 20 crashed < 2017-03-31 10:29:03.688 BST >DEBUG: shmem_exit(1): 6 on_shmem_exit callbacks to make < 2017-03-31 10:29:03.688 BST >DEBUG: proc_exit(1): 2 callbacks to make < 2017-03-31 10:29:03.688 BST >DEBUG: exit(1) < 2017-03-31 10:29:03.688 BST >DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make < 2017-03-31 10:29:03.688 BST >DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make < 2017-03-31 10:29:03.688 BST >DEBUG: proc_exit(-1): 0 callbacks to make < 2017-03-31 10:29:03.688 BST >LOG: worker process: pglogical apply 720238014:2428795767 (PID 13159) exited with exit code 1 < 2017-03-31 10:29:03.688 BST >LOG: unregistering background worker "pglogical apply 720238014:2428795767" < 2017-03-31 10:29:03.688 BST >DEBUG: apply worker at slot 3 exited before we noticed it started < 2017-03-31 10:29:03.688 BST >DEBUG: CommitTransaction

mxjones121 commented 7 years ago

Will continue to investigate locally. Many errors that seem unexplained.