cybertec-postgresql / pg_squeeze

A PostgreSQL extension for automatic bloat cleanup
Other
470 stars 31 forks source link

Running squeeze causes continuous lag build up in logical replication architecture #52

Closed patrickmglynn closed 2 months ago

patrickmglynn commented 2 years ago

Hello,

I have observed that running pg_squeeze on large tables either manually or on a schedule, leads to replication lag building constantly and requires intervention to prevent continuous WAL growth.

Our environment is Postgresql 12.1 running under Redhat 7.9, with pg_squeeze version 1.3:

test_db=# \! rpm -qa | grep squeeze
pg_squeeze_12-1.3.1-1.rhel7.x86_64

test_db=# \dx pg_squeeze
                          List of installed extensions
    Name    | Version | Schema  |                  Description
------------+---------+---------+------------------------------------------------
 pg_squeeze | 1.3     | squeeze | A tool to remove unused space from a relation.
(1 row)

One publisher replicates data to one subscriber. max_replication_slots and max_wal_senders are both set to 10 on the publisher, max_wal_senders is 20 on the subscriber and we currently have 6 subscriptions operating in total (one per schema to replicate).

Prior to running squeeze the replication slots on the publisher are all active and the subscription state is streaming:

test_db=# SELECT * FROM pg_replication_slots;
      slot_name      |  plugin  | slot_type | datoid |  database   | temporary | active | active_pid | xmin | catalog_xmin |  restart_lsn  | confirmed_flush_lsn
---------------------+----------+-----------+--------+-------------+-----------+--------+------------+------+--------------+---------------+---------------------
 sub1                | pgoutput | logical   |  16402 | test_db     | f         | t      |      21570 |      |    987626303 | 36E0/23DB3188 | 36E0/23F900F8
 sub2                | pgoutput | logical   |  16402 | test_db     | f         | t      |      23310 |      |    987626303 | 36E0/23DB3188 | 36E0/23F900F8
 sub3                | pgoutput | logical   |  16402 | test_db     | f         | t      |      23311 |      |    987626303 | 36E0/23DB3188 | 36E0/23F900F8
 sub4                | pgoutput | logical   |  16402 | test_db     | f         | t      |      23313 |      |    987626303 | 36E0/23DB3188 | 36E0/23F900F8
 sub5                | pgoutput | logical   |  16402 | test_db     | f         | t      |      23390 |      |    987626303 | 36E0/23DB3188 | 36E0/23F900F8
 sub6                | pgoutput | logical   |  16402 | test_db     | f         | t      |      23389 |      |    987626303 | 36E0/23DB3188 | 36E0/23F900F8
(6 rows)
test_db=# SELECT * FROM pg_stat_replication;
  pid  | usesysid |       usename       |  application_name   | client_addr  |         client_hostname         | client_port |         backend_start         | backend_xmin |   state   |   sent_lsn    |   write_lsn   |   flush_lsn   |  replay_lsn   |    write_lag    |    flush_lag    |   replay_lag    | sync_priority | sync_state |       
   reply_time
-------+----------+---------------------+---------------------+--------------+---------------------------------+-------------+-------------------------------+--------------+-----------+---------------+---------------+---------------+---------------+-----------------+-----------------+-----------------+---------------+------------+-------------------------------
 21570 |   991114 | logical_replication | sub1                | **.***.**.** | my.fqdn.org                     |       40072 | 2022-02-11 15:51:01.531025+08 |              | streaming | 36E0/285D19C0 | 36E0/285D19C0 | 36E0/285D19C0 | 36E0/285D19C0 | 00:00:00.00045  | 00:00:00.00045  | 00:00:00.00045  |             0 | async      | 2022-02-21 15:32:17.793954+08
 23310 |   991114 | logical_replication | sub2                | **.***.**.** | my.fqdn.org                     |       40094 | 2022-02-11 15:51:06.703731+08 |              | streaming | 36E0/285D19C0 | 36E0/285D19C0 | 36E0/285D19C0 | 36E0/285D19C0 | 00:00:00.000435 | 00:00:00.000435 | 00:00:00.000435 |             0 | async      | 2022-02-21 15:32:17.793945+08
 23311 |   991114 | logical_replication | sub3                | **.***.**.** | my.fqdn.org                     |       40096 | 2022-02-11 15:51:06.71485+08  |              | streaming | 36E0/285D19C0 | 36E0/285D19C0 | 36E0/285D19C0 | 36E0/285D19C0 | 00:00:00.000411 | 00:00:00.000411 | 00:00:00.000411 |             0 | async      | 2022-02-21 15:32:17.793884+08
 23313 |   991114 | logical_replication | sub4                | **.***.**.** | my.fqdn.org                     |       40100 | 2022-02-11 15:51:06.726784+08 |              | streaming | 36E0/285D19C0 | 36E0/285D19C0 | 36E0/285D19C0 | 36E0/285D19C0 | 00:00:00.000395 | 00:00:00.000395 | 00:00:00.000395 |             0 | async      | 2022-02-21 15:32:17.793891+08
 23390 |   991114 | logical_replication | sub5                | **.***.**.** | my.fqdn.org                     |       40204 | 2022-02-11 15:51:12.325621+08 |              | streaming | 36E0/285D19C0 | 36E0/285D19C0 | 36E0/285D19C0 | 36E0/285D19C0 | 00:00:00.000588 | 00:00:00.000588 | 00:00:00.000588 |             0 | async      | 2022-02-21 15:32:17.793901+08
 23389 |   991114 | logical_replication | sub6                | **.***.**.** | my.fqdn.org                     |       40202 | 2022-02-11 15:51:12.322993+08 |              | streaming | 36E0/285D19C0 | 36E0/285D19C0 | 36E0/285D19C0 | 36E0/285D19C0 | 00:00:00.000343 | 00:00:00.000343 | 00:00:00.000343 |             0 | async      | 2022-02-21 15:32:17.794075+08
(6 rows)

If I attempt to run the below on a large table within the sub1 publication that occupies 83GB on disk (68GB of this is bloat):

test_db=# \dt+ myschema.bloated_table;
                         List of relations
 Schema    |       Name        | Type  | Owner |  Size     | Description 
-----------+-------------------+-------+-------+--------+-------------
 myschema  | bloated_table     | table | myuser| 83 GB     |
(1 row)

test_db=# SELECT squeeze.squeeze_table('myschema', 'bloated_table', null, null, null);
 squeeze_table 
---------------

(1 row) 

The operation finishes successfully (reducing the size of the table dramatically), but replication has now lagged during the squeeze process and then never recovers:

test_db=# \dt+ myschema.bloated_table;
                       List of relations
 Schema    |       Name        | Type  | Owner |  Size     | Description 
-----------+-------------------+-------+-------+--------+-------------
 myschema  | bloated_table     | table | myuser| 9817 MB   |
(1 row)

test_db=# SELECT * FROM pg_replication_slots;
      slot_name      |  plugin  | slot_type | datoid |  database   | temporary | active | active_pid | xmin | catalog_xmin |  restart_lsn  | confirmed_flush_lsn
---------------------+----------+-----------+--------+-------------+-----------+--------+------------+------+--------------+---------------+---------------------
 sub1                | pgoutput | logical   |  16402 | test_db     | f         | t      |      12535 |      |    988419346 | 36E3/2C8877A0 | 36E5/C43037F0
 sub2                | pgoutput | logical   |  16402 | test_db     | f         | t      |      12536 |      |    988419346 | 36E3/2C8877A0 | 36E5/C43037F0
 sub3                | pgoutput | logical   |  16402 | test_db     | f         | t      |      12537 |      |    988419346 | 36E3/2C8877A0 | 36E5/C43037F0
 sub4                | pgoutput | logical   |  16402 | test_db     | f         | t      |      12538 |      |    988419346 | 36E3/2C8877A0 | 36E5/C43037F0
 sub5                | pgoutput | logical   |  16402 | test_db     | f         | t      |      12539 |      |    988419346 | 36E3/2C8877A0 | 36E5/C43037F0
 sub6                | pgoutput | logical   |  16402 | test_db     | f         | t      |      12540 |      |    988419346 | 36E3/2C8877A0 | 36E5/C43037F0
(6 rows)

test_db=# SELECT * FROM pg_stat_replication;
  pid  | usesysid |       usename       |  application_name   | client_addr  |         client_hostname          | client_port |         backend_start         | backend_xmin | state   |   sent_lsn    |   write_lsn   |   flush_lsn   |  replay_lsn   | write_lag | flush_lag | replay_lag | sync_priority | sync_state | reply_time
-------+----------+---------------------+---------------------+--------------+----------------------------------+-------------+-------------------------------+--------------+---------+---------------+---------------+---------------+---------------+-----------+-----------+------------+---------------+------------+-------------------------------
 12535 |   991114 | logical_replication | sub1                | **.***.**.** | my.fqdn.org                      |       56826 | 2022-02-22 10:56:55.3298+08   |              | catchup | 36E5/C430CA30 | 36E5/C43037F0 | 36E5/C43037F0 | 36E5/C43037F0 |           |           |            |             0 | async      | 2022-02-22 11:05:28.787884+08
 12536 |   991114 | logical_replication | sub2                | **.***.**.** | my.fqdn.org                      |       56828 | 2022-02-22 10:56:55.339875+08 |              | catchup | 36E5/C430CA30 | 36E5/C43037F0 | 36E5/C43037F0 | 36E5/C43037F0 |           |           |            |             0 | async      | 2022-02-22 11:05:28.776368+08
 12537 |   991114 | logical_replication | sub3                | **.***.**.** | my.fqdn.org                      |       56830 | 2022-02-22 10:56:55.350883+08 |              | catchup | 36E5/C430CA30 | 36E5/C43037F0 | 36E5/C43037F0 | 36E5/C43037F0 |           |           |            |             0 | async      | 2022-02-22 11:05:28.73297+08
 12538 |   991114 | logical_replication | sub4                | **.***.**.** | my.fqdn.org                      |       56832 | 2022-02-22 10:56:55.361507+08 |              | catchup | 36E5/C430CA30 | 36E5/C43037F0 | 36E5/C43037F0 | 36E5/C43037F0 |           |           |            |             0 | async      | 2022-02-22 11:05:29.009785+08
 12539 |   991114 | logical_replication | sub5                | **.***.**.** | my.fqdn.org                      |       56834 | 2022-02-22 10:56:55.371779+08 |              | catchup | 36E5/C430CA30 | 36E5/C43037F0 | 36E5/C43037F0 | 36E5/C43037F0 |           |           |            |             0 | async      | 2022-02-22 11:05:28.800016+08
 12540 |   991114 | logical_replication | sub6                | **.***.**.** | my.fqdn.org                      |       56836 | 2022-02-22 10:56:55.382464+08 |              | catchup | 36E5/C430CA30 | 36E5/C43037F0 | 36E5/C43037F0 | 36E5/C43037F0 |           |           |            |             0 | async      | 2022-02-22 11:05:29.212577+08
(6 rows)

test_db=# \! du -sh /var/lib/pgsql/12/data/pg_wal
11G     /var/lib/pgsql/12/data/pg_wal

From the above we can see the slots are still active on the publisher but the subscription is now stuck in the catchup state (also the restart_lsn and confirmed_flush_lsn do not change) and the size of the pg_wal directory has now grown to 11GB and continues to grow over time.

Checking the subscriber logs shows the replication workers timing out as the squeeze process runs:

<2022-02-22 12:38:02 AEDT  >ERROR:  terminating logical replication worker due to timeout
<2022-02-22 12:38:02 AEDT  >LOG:  background worker "logical replication worker" (PID 5085) exited with exit code 1
<2022-02-22 12:38:02 AEDT  >LOG:  logical replication apply worker for subscription "sub3" has started
<2022-02-22 12:38:02 AEDT  >LOG:  background worker "logical replication worker" (PID 5148) exited with exit code 1
<2022-02-22 12:38:02 AEDT  >ERROR:  could not start WAL streaming: ERROR:  replication slot "sub3" is active for PID 23390

wal_sender_timeout is 5min on the publisher and 1min on the subscriber side.

At this point the only way to get the subscriptions working again is to DROP and CREATE them again, thus losing the built up WAL files. ALTER SUBSCRIPTION DISABLE/ENABLE has no effect and neither does restarting the postgresql-12 service on the subscriber side.

Is there a misconfiguration or setting I have overlooked here?

Please advise any suggestions as to how I can incorporate squeeze into our environment alongside our logical replication setup.

ahouska commented 2 years ago

Can you please test the same scenario with pg_squeeze 1.4 (actually 1.4.1 is the last release)? It should consume less disk space (see https://github.com/cybertec-postgresql/pg_squeeze/issues/51#issue-1127110005 for details), so the processing might also be a bit faster. If the replication still gets stuck due to logical worker's exit, I suggest you to increase the wal_receiver_timeout configuration variable (on the subscriber's side).

patrickmglynn commented 2 years ago

Thanks for the quick reply.

I will work on building the 1.4.1 release, in the meantime is there an .rpm available for this? I couldn't find anything in the postgres yum repo.

ahouska commented 2 years ago

I thought the point of the issue https://github.com/cybertec-postgresql/pg_squeeze/issues/47#issue-910280352 was to add pg_squeeze 1.4 to the official repository, but it really appears not to be there. I've submitted a request to setup the build.

ahouska commented 2 years ago

pg_squeeze 1.4 is now in the PGDG repository (yum). It can only be installed to PG 14 though. You need to install it from source if you use lower PG version.

patrickmglynn commented 2 years ago

I was able to install 1.4.1 from the official PG12 repository. The subscriber has wal_receiver_timeout set to 5min, unfortunately replication lag remains fixed at the size of the table being squeezed once the squeeze process completes.

I will re-test in PG14 once we are able to move to this version.

damianzawadzki commented 3 months ago

patrikmglynn did you solve the problem? I've just hit exactly the same issue.

kovmir commented 2 months ago

Stale, re-open if needed.