2ndQuadrant / pglogical

Logical Replication extension for PostgreSQL 15, 14, 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgrades.
http://2ndquadrant.com/en/resources/pglogical/
Other
987 stars 153 forks source link

pglogical crashes AWS Aurora during replication with Segmentation fault #458

Open JulienAndonov opened 7 months ago

JulienAndonov commented 7 months ago

Hey guys. I have the following issues. During normal operation, pglogical crashes on the destination side, which is RDS aurora PGSQL 14.8 using pglogical. 2.4.2:

Destination side

2024-01-25 13:17:09 UTC::@:[537]:LOG: background worker "pglogical apply 131082:4047160452" (PID 6709) was terminated by signal 11: Segmentation fault 2024-01-25 13:17:09 UTC::@:[537]:LOG: terminating any other active server processes 2024-01-25 13:17:09 UTC::@:[537]:FATAL: Can't handle storage runtime process crash 2024-01-25 13:17:09 UTC::@:[537]:LOG: database system is shutess crash 2024-01-25 13:17:09 UTC::@:[537]:LOG: database system is shut down

After that this initial error, the cluster enters into continuous rebooting and crashing, causing significant CPU usage and resources.

On source side we have some queries which are done couple seconds before that crash, but they don't seem to cause the problem as after re-creating the environment and re-executing the queries, the problem doesn't occur.

On the source cluster we are having these errors after the initial error on the destination: 2024-01-25 13:17:09 UTC:(63772):user@database_name:[26536]:LOG: could not receive data from client: Connection reset by peer 2024-01-25 13:17:09 UTC:(63772):user@database_name:[26536]:STATEMENT: START_REPLICATION SLOT "replication_slot_name" LOGICAL 12/28C9A430 (expected_encoding 'UTF8', min_proto_version '1', max_proto_version '1', startup_params_format '1', "binary.want_internal_basetypes" '1', "binary.want_binary_basetypes" '1', "binary.basetypes_major_version" '1400', "binary.sizeof_datum" '8', "binary.sizeof_int" '4', "binary.sizeof_long" '8', "binary.bigendian" '0', "binary.float4_byval" '0', "binary.float8_byval" '1', "binary.integer_datetimes" '0', "hooks.setup_function" 'pglogical.pglogical_hooks_setup', "pglogical.forward_origins" '"all"', "pglogical.replication_set_names" 'tenant_service', "relmeta_cache_size" '-1', pg_version '140008', pglogical_version '2.4.2', pglogical_version_num '20402', pglogical_apply_pid '6709') 2024-01-25 13:17:09 UTC:*(63772):user@database_name:[26536]:LOG: unexpected EOF on standby connection

Source and Destination: RDS Aurora PostgreSQL 14.8 pglogical: 2.4.2

Source: 1 Writer 1 Reader

Destination: 1 Writer

Karthik-Colligence commented 11 hours ago

Hey, Any update on this issue? I had the same issue popping up when i try pglogical in a similar scenario. Let us know if any updates on this "Segmentation fault" issue

andonovj commented 11 hours ago

Yes, the problem was related to virtual column. Check if any of the tables you try to migrate has a virtual column. If yes, you have to remove it from the replication and add it on the destination. That worked for me :-)