EnterpriseDB / repmgr

A lightweight replication manager for PostgreSQL (Postgres)
https://repmgr.org/
Other
1.57k stars 251 forks source link

standby falsely promoted after cloning from PITR recovered primary #793

Open Bschitter opened 1 year ago

Bschitter commented 1 year ago

I have a database setup with 3 nodes:

  1. primary
  2. standby
  3. backup/repository/archive node.

PostgreSQL: 15.1 repmgr: 5.3.3 pgbackrest: 2.43

When I do a PITR with pgbackrest to the primary node, then pgbackrest will write the following to the postgresql.auto.conf file:

recovery_target_time = '2023-01-27 13:30:00.000000+01'
recovery_target_action = 'promote'

When I now want to clone my standby with repmgr standby clone from the primary and join it to the cluster, it will also copy the postgresql.auto.conf file. repmgr will add the primary_conninfo entry to postgresql.auto.conf, but does not delete old entries from it. And after starting the standby database, it will correctly start as standby since the standby.signal is present, but will then replay WALs from the archive and promote itself, since the recovery_target_action is still promote.

We didn't have this issue with PostgreSQL 11, repmgr 5.0.0 and pgbackrest 2.18, when there was still the recovery.conf file.

Am I doing something wrong? Or is cloning the standby after a PITR not supported by repmgr?

Since it says not to edit the postgresql.auto.conf, I prefer not to edit or delete it.

I also found this somewhat related issue https://github.com/EnterpriseDB/repmgr/issues/543, but cannot tell what exactly was done about it.

Bschitter commented 1 year ago

I found out that I can use the ALTER SYSTEM command to reset configuration properties inside postgresql.auto.conf, see https://www.postgresql.org/docs/15/sql-altersystem.html

So after a restore and before executing repmgr standby clone I will execute ALTER SYSTEM RESET ALL; to remove configuration properties inside postgresql.auto.conf inserted by pgbackrest. This solves my issue.

But still I'm thinking that it would be nice if repmgr would show a warning, a hint or even refuse to clone, if something will be preventing the database from starting in standby mode properly. I understand however that the documentation says:

Note that by default, all configuration files in the source node's data directory will be copied to the cloned node. Typically these will be postgresql.conf, postgresql.auto.conf, pg_hba.conf and pg_ident.conf. These may require modification before the standby is started.