2ndQuadrant / pglogical

Logical Replication extension for PostgreSQL 17, 16, 15, 14, 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgrades.
http://2ndquadrant.com/en/resources/pglogical/
Other
1.04k stars 154 forks source link

Pglogical on Postgres 14 error: [LOG: could not send data to client: Connection reset by peer] #416

Open lpossamai opened 1 year ago

lpossamai commented 1 year ago

Hi there!

I'm setting up a brand new PGlogical connection between two PostgreSQL 14 versions.

The Source database is (AWS EC2 Instance): pglogical version on source: 2.4.2-2.pgdg20.04+1 postgres version on source: PostgreSQL 14.7 (Ubuntu 14.7-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit

The Target database is (AWS RDS Aurora V2 with RDS Proxy): pglogical version on target: 2.4.1 postgres version on target: PostgreSQL 14.6 on aarch64-unknown-linux-gnu, compiled by aarch64-unknown-linux-gnu-gcc (GCC) 7.4.0, 64-bit


Once initial sync has been completed (+900GB), the subscriber is shown as down (select * from pglogical.show_subscription_status();):

2023-02-28 01:27:40 UTC [1360826]: [1-1] user=pglogicaluser,db=db_live,app=subscription1,client=10.0.135.249 LOG:  starting logical decoding for slot "pgl_db_live_test_provider1_subscription1"
2023-02-28 01:27:40 UTC [1360826]: [2-1] user=pglogicaluser,db=db_live,app=subscription1,client=10.0.135.249 DETAIL:  Streaming transactions committing after F9D/2501B678, reading WAL from F9D/2501B5C8.
2023-02-28 01:27:40 UTC [1360826]: [3-1] user=pglogicaluser,db=db_live,app=subscription1,client=10.0.135.249 STATEMENT:  START_REPLICATION SLOT "pgl_db_live_test_provider1_subscription1" LOGICAL F9D/2501B5C8 (expected_encoding 'UTF8', min_proto_version '1', max_proto_version '1', startup_params_format '1', "binary.want_internal_basetypes" '1', "binary.want_binary_basetypes" '1', "binary.basetypes_major_version" '1400', "binary.sizeof_datum" '8', "binary.sizeof_int" '4', "binary.sizeof_long" '8', "binary.bigendian" '0', "binary.float4_byval" '0', "binary.float8_byval" '1', "binary.integer_datetimes" '0', "hooks.setup_function" 'pglogical.pglogical_hooks_setup', "pglogical.forward_origins" '"all"', "pglogical.replication_set_names" '"default",dm,integrations,doctrine,util', "relmeta_cache_size" '-1', pg_version '140006', pglogical_version '2.4.1', pglogical_version_num '20401', pglogical_apply_pid '30011')
2023-02-28 01:27:40 UTC [1360826]: [4-1] user=pglogicaluser,db=db_live,app=subscription1,client=10.0.135.249 LOG:  logical decoding found consistent point at F9D/2501B5C8
2023-02-28 01:27:40 UTC [1360826]: [5-1] user=pglogicaluser,db=db_live,app=subscription1,client=10.0.135.249 DETAIL:  There are no running transactions.
2023-02-28 01:27:40 UTC [1360826]: [6-1] user=pglogicaluser,db=db_live,app=subscription1,client=10.0.135.249 STATEMENT:  START_REPLICATION SLOT "pgl_db_live_test_provider1_subscription1" LOGICAL F9D/2501B5C8 (expected_encoding 'UTF8', min_proto_version '1', max_proto_version '1', startup_params_format '1', "binary.want_internal_basetypes" '1', "binary.want_binary_basetypes" '1', "binary.basetypes_major_version" '1400', "binary.sizeof_datum" '8', "binary.sizeof_int" '4', "binary.sizeof_long" '8', "binary.bigendian" '0', "binary.float4_byval" '0', "binary.float8_byval" '1', "binary.integer_datetimes" '0', "hooks.setup_function" 'pglogical.pglogical_hooks_setup', "pglogical.forward_origins" '"all"', "pglogical.replication_set_names" '"default",dm,integrations,doctrine,util', "relmeta_cache_size" '-1', pg_version '140006', pglogical_version '2.4.1', pglogical_version_num '20401', pglogical_apply_pid '30011')
2023-02-28 01:27:40 UTC [1360826]: [7-1] user=pglogicaluser,db=db_live,app=subscription1,client=10.0.135.249 LOG:  could not send data to client: Connection reset by peer
2023-02-28 01:27:40 UTC [1360826]: [8-1] user=pglogicaluser,db=db_live,app=subscription1,client=10.0.135.249 CONTEXT:  slot "pgl_db_live_test_provider1_subscription1", output plugin "pglogical_output", in the change callback, associated LSN F9D/25056320
2023-02-28 01:27:40 UTC [1360826]: [9-1] user=pglogicaluser,db=db_live,app=subscription1,client=10.0.135.249 STATEMENT:  START_REPLICATION SLOT "pgl_db_live_test_provider1_subscription1" LOGICAL F9D/2501B5C8 (expected_encoding 'UTF8', min_proto_version '1', max_proto_version '1', startup_params_format '1', "binary.want_internal_basetypes" '1', "binary.want_binary_basetypes" '1', "binary.basetypes_major_version" '1400', "binary.sizeof_datum" '8', "binary.sizeof_int" '4', "binary.sizeof_long" '8', "binary.bigendian" '0', "binary.float4_byval" '0', "binary.float8_byval" '1', "binary.integer_datetimes" '0', "hooks.setup_function" 'pglogical.pglogical_hooks_setup', "pglogical.forward_origins" '"all"', "pglogical.replication_set_names" '"default",dm,integrations,doctrine,util', "relmeta_cache_size" '-1', pg_version '140006', pglogical_version '2.4.1', pglogical_version_num '20401', pglogical_apply_pid '30011')

Log from the target (RDS):

2023-02-28 01:21:40 UTC::@:[498]:LOG:  background worker "pglogical apply 16400:1763399739" (PID 29659) exited with exit code 1

The query SELECT * FROM pglogical.local_sync_status WHERE sync_status <> 'r'; returns no results.

What else could I check, please?

Thanks in advance! Lucas

yannh commented 1 year ago

For people finding this issue - I would recommend triple-checking the "limitations and restrictions" part of the Readme and ensure that you do absolutely respect them. We have discovered after significant effort that the pglogical worker on the replica would just crash when it encounters some of these corner cases, leading to the aforementioned issue, with not much logging for troubleshooting.

In our case - we were using deferred index updates and did not immediately realize the incompatibility. It took quite a lot of effort to figure out this was the root cause of this error. Hope this is useful to people finding this issue :bow: