cloudfoundry-community / postgres-boshrelease

A BOSH release for deploying PostgreSQL
MIT License
8 stars 10 forks source link

Add HA Postgres features, with auto-failover/recovery #22

Closed Proplex closed 6 years ago

Proplex commented 6 years ago

This PR implements the functionality necessary for a single IP-based HA Postgres instance. It accomplishes this through HAProxy and a VRRP VIP and some health-check scripts. Essentially, it does as follows (excerpt from @jhunt)

On bootstrap, if there is no data directory, the postgres job will
revert to a normal, index-based setup.  The first node will assume
the role of the master, and the second will become a replica.

Once the data directory has been populated, future restarts of the
postgres job will attempt to contact the other node to see if it
is a master.  If the other node responds, and reports itself as a
master, the local node will attempt a `pg_basebackup` from the
master and assume the role of a replica.

If the other node doesn't respond, or reports itself as a replica,
the local node will keep trying, for up to
`postgres.replication.grace` seconds, at which point it will
assume the mantle of leadership and become the master node,
using its current data directory as the canonical truth.

Each node then starts up a `monitor` process; this process is
responsible for ultimately promoting a local replica to be a
master, in the event that the real master goes offline.  It works
like this:

  1. Busy-loop (via 1-second sleeps) until the local postgres
     instance is available on its configured port.  This prevents
     monitor from trying to restart the postgres while it is
     running a replica `pg_basebackup`.

  2. Busy-loop (again via 1-second sleeps) for as long as the
     local postgres is a master.

  3. Busy-loop (again via 1-second sleeps), checking the master
     status of the other postgres node, until it detects that
     either the master node has gone away (via a connection
     timeout), or the master node has somehow become a replica.

  4. Promote the local postgres node to a master.

This has been tested by hand in vSphere with CF's ccdb, uaa, and diegodb using postgres as it's storage medium. We noticed a x < 5 second outage during master failover. Release notes are already written.

jhunt commented 6 years ago

Split-brain is almost impossible to prevent without a quorum solution, or some sort of binary star pattern. We are mostly trying to protect against outages during a deployment.

Proplex commented 6 years ago

Feedback addressed!