ClusterLabs / PAF

PostgreSQL Automatic Failover: High-Availibility for Postgres, based on Pacemaker and Corosync.
http://clusterlabs.github.io/PAF/
Other
335 stars 55 forks source link

pg_rewind automatically #212

Open wargebitebane opened 1 year ago

wargebitebane commented 1 year ago

I was looking on patroni and repmgr and they both have some special flags to automate pgrewind after failover. Is there anything similar in PAF? Its looks uncomfortable to get new promoted node when all nodes was rebooted accidentally (by some power supply issue or so) and then you need to do pg rewind manually.

ioguix commented 1 year ago

Hi,

I was looking on patroni and repmgr and they both have some special flags to automate pg_rewind after failover. Is there anything similar in PAF?

There's nothing similar in PAF, by design. It is quite scary to automatically get back online a failed node and removing some of its activity without manual checkup first, at least to pinpoint what was the crash origin and the real status of the node.

Its looks uncomfortable to get new promoted node when all nodes was rebooted accidentally (by some power supply issue or so) and then you need to do pg_ rewind manually.

You should not have to pg_rewind in such situation.

I expect you should just start manually the primary, then the secondaries, wait for them to settle, then restart the pacemaker stack.

  1. the PostgreSQL primary should recover
  2. then secondaries should resynch with it
  3. when starting Pacemaker, it should detect the clean status of all of them.