compose / governor

Runners to orchestrate a high-availability PostgreSQL
MIT License
511 stars 75 forks source link

Etcd attempts #47

Closed crigertg closed 7 years ago

crigertg commented 7 years ago

allowing that the attempts of get_client_path are configurable in the postgresql.yml.

Winslett commented 7 years ago

So…this is a much deeper issue than it appears at the surface. There needs to be an extensive amount of warning around the extension of timeouts and re-attempts. Governor relies on the expectation that ETCD has all the information about the environment. The extension of timeouts and connection reattempts reduces the trust that Governor puts on ETCD, which introduces more potential for split brain clusters. This is "okay" for people who can trade off consistency for availability. Scenario:

etcd:
  timeout: 60
  attempts: 5

The above scenario would mean that a Governor acting as a leader, if losing connection to ETCD, would hold the state of primary for 5 minutes before going read-only. In the mean time, during that 5 minutes, other members of the cluster would take over the primary role. This behavior introduces manual cleanup because of split brain, and may be a serious issue for people who require higher consistency.

crigertg commented 7 years ago

Maybe it helps if I explain what problems we're encouter in some of our environments. We had some problems with etcd not responding on a request. We were not able to debug this but installed some monitor scripts to see the availability of etcd. The check runs every 5 seconds to check for an etcd response. After serving one repsonse with return code 500 it continued to respond with 200. But one failed request to etcd leads to a postgresql shutdown which terminates all existing connections. In the next loop it recovers and the old master promotes himself to the master again. This leads to an unexpected downtime of postgres because one request to etcd failed. A second check attempt helps to solve this problem.

Your example scenario can lead to a split brain, thats right. Maybe we could ensure that the absolute retry time for etcd is lower than the etcd TTL. But we have to consider the loop_wait too.

if config["etcd"]["timeout"] * (config["etcd"]["attempts]" - 1) >= (config["etcd"]["ttl"] - config["loop_wait"])
  logger.warning("Failed to start governor: absolute retry time for etcd requests is higher than the lock TTL")
  sys.exit(1)

(code snippet not tested, just for an example)

crigertg commented 7 years ago

After thinking about this, the compare to the TTL doesn't fix the split-brain issue either. I'll need to have a look when read requests are sent and when the current master is updated and the TTL gets refreshed. Maybe I can find a solution this way.

crigertg commented 7 years ago

I fixed the problem with a more robust etcd setup. I will close this PR. Thank you for your time!