Leader elections with non-HA kubernetes.

jaredkipe commented 2 years ago

Overview

Currently, 10s timeout gets broken down into ~5s timeout request and two ~2s timeout requests. This will cause leader elections with even moderate etcd unavailability (I believe retry_timeout on https://patroni.readthedocs.io/en/latest/SETTINGS.html )

I have observed this with standalone and HA postgres clusters running on Linode LKE (none are HA LKE).

Mon, Apr 25 2022 9:46:48 am 2022-04-25 16:46:48,059 INFO: no action. I am (odoo-00-ccd2-0) the leader with the lock
Mon, Apr 25 2022 9:47:02 am 2022-04-25 16:46:57,902 INFO: Lock owner: odoo-00-ccd2-0; I am odoo-00-ccd2-0
Mon, Apr 25 2022 9:47:02 am 2022-04-25 16:47:02,908 ERROR: Request to server https://10.128.0.1:443 failed: ReadTimeoutError("HTTPSConnectionPool(host='10.128.0.1', port=443): Read timed out. (read timeout=4.999539108015597)",)
Mon, Apr 25 2022 9:47:05 am 2022-04-25 16:47:05,760 ERROR: Request to server https://10.128.0.1:443 failed: ReadTimeoutError("HTTPSConnectionPool(host='10.128.0.1', port=443): Read timed out. (read timeout=2.137138576246798)",)
Mon, Apr 25 2022 9:47:07 am 2022-04-25 16:47:07,908 ERROR: Request to server https://10.128.0.1:443 failed: ReadTimeoutError("HTTPSConnectionPool(host='10.128.0.1', port=443): Read timed out. (read timeout=1.9255160954780877)",)
Mon, Apr 25 2022 9:47:07 am 2022-04-25 16:47:07,909 ERROR: failed to update leader lock
Mon, Apr 25 2022 9:47:10 am 2022-04-25 16:47:10,629 INFO: demoted self because failed to update leader lock in DCS
Mon, Apr 25 2022 9:47:10 am 2022-04-25 16:47:10,630 WARNING: Loop time exceeded, rescheduling immediately.
Mon, Apr 25 2022 9:47:10 am 2022-04-25 16:47:10,640 INFO: closed patroni connection to the postgresql cluster
Mon, Apr 25 2022 9:47:11 am 2022-04-25 16:47:11.123 UTC [921854] LOG: pgaudit extension initialized
Mon, Apr 25 2022 9:47:11 am 2022-04-25 16:47:11.160 UTC [921854] LOG: redirecting log output to logging collector process
Mon, Apr 25 2022 9:47:11 am 2022-04-25 16:47:11.160 UTC [921854] HINT: Future log output will appear in directory "log".
Mon, Apr 25 2022 9:47:11 am 2022-04-25 16:47:11,164 INFO: postmaster pid=921854
Mon, Apr 25 2022 9:47:11 am /tmp/postgres:5432 - rejecting connections
Mon, Apr 25 2022 9:47:11 am /tmp/postgres:5432 - rejecting connections
Mon, Apr 25 2022 9:47:12 am /tmp/postgres:5432 - rejecting connections
Mon, Apr 25 2022 9:47:13 am 2022-04-25 16:47:10,632 INFO: Lock owner: odoo-00-ccd2-0; I am odoo-00-ccd2-0
Mon, Apr 25 2022 9:47:13 am 2022-04-25 16:47:13,141 ERROR: Request to server https://10.128.0.1:443 failed: ReadTimeoutError("HTTPSConnectionPool(host='10.128.0.1', port=443): Read timed out. (read timeout=2.5)",)
Mon, Apr 25 2022 9:47:13 am /tmp/postgres:5432 - accepting connections
Mon, Apr 25 2022 9:47:15 am 2022-04-25 16:47:15,218 ERROR: Request to server https://10.128.0.1:443 failed: ReadTimeoutError("HTTPSConnectionPool(host='10.128.0.1', port=443): Read timed out. (read timeout=1.8151276111602783)",)
Mon, Apr 25 2022 9:47:17 am 2022-04-25 16:47:17,124 INFO: updated leader lock during starting after demotion
Mon, Apr 25 2022 9:47:17 am 2022-04-25 16:47:17,131 INFO: Lock owner: odoo-00-ccd2-0; I am odoo-00-ccd2-0
Mon, Apr 25 2022 9:47:17 am 2022-04-25 16:47:17,132 INFO: establishing a new patroni connection to the postgres cluster
Mon, Apr 25 2022 9:47:17 am 2022-04-25 16:47:17,921 INFO: promoted self to leader because I had the session lock
Mon, Apr 25 2022 9:47:17 am server promoting
Mon, Apr 25 2022 9:47:17 am 2022-04-25 16:47:17,936 INFO: cleared rewind state after becoming the leader
Mon, Apr 25 2022 9:47:19 am 2022-04-25 16:47:19,614 INFO: no action. I am (odoo-00-ccd2-0) the leader with the lock
Mon, Apr 25 2022 9:47:20 am 2022-04-25 16:47:20,141 INFO: no action. I am (odoo-00-ccd2-0) the leader with the lock

Use Case

Stand alone clusters could stay up even with the DCS down.

Is this the correct way to influence this configuration?

  patroni:
    dynamicConfiguration:
      loop_wait: 120
      postgresql:
        parameters:
          max_connections: 120
          max_parallel_workers: 4
          max_worker_processes: 4
          shared_buffers: 2GB
          work_mem: 3MB
      retry_timeout: 600
      ttl: 200
    leaderLeaseDurationSeconds: 300
    port: 8008
    syncPeriodSeconds: 60

Desired Behavior

I personally believe the defaults should be relaxed, but barring that I'd love documentation on how to tweak this and what would be reasonable for stand alone clusters (stand alone clusters really shouldn't be leader electing, but I understand why it is...).

Environment

Tell us about your environment:

Please provide the following details:

Platform: (Kubernetes, LKE)
Platform Version: (1.21.12, 5.0.5)
PGO Image Tag: (ubi8-5.0.5-0)
Postgres Version (13)
Storage: (hostpath)
Number of Postgres clusters: (1)

Additional Information

HA Kubernetes would probably help/resolve, but configuration would go a long way.

tolleiv commented 2 years ago

Hi - the chosen defaults are "just" what patroni brings along -> see https://patroni.readthedocs.io/en/latest/SETTINGS.html

Regarding your settings, they are somewhat mixed up.

patroni.leaderLeaseDurationSeconds is mapped to ttl patroni.syncPeriodSeconds is mapped to loop_wait

Also leaderLeaseDurationSeconds / ttl should be larger than syncPeriodSeconds / loop_wait - the settings somehow violate this.

See: https://github.com/CrunchyData/postgres-operator/blob/master/internal/patroni/config.md#postgresql-and-failover-configuration

They should even be kept consistent with each other: ttl > loop_wait + 2 * retry_timeout

My assumption for more stability is that raising ttl should be enough to deal with non-HA DCS setups. Patroni would then just get more chances to retry before the master is demoted. Would be great what the Crunchydata experiences are in these situations.

daadu commented 1 year ago

@jaredkipe I am facing the same issue, can you share the config that fixed this for you?

jaredkipe commented 1 year ago

@daadu yeah its the block of patroni config. I wouldn't say it is solved as much as delayed, but it has more or less solved it for us.

tolleiv commented 1 year ago

I think this will be improved a lot with Patroni 3.x and the DCS fail_safe configuration: see https://github.com/zalando/patroni/pull/2379 and https://github.com/zalando/patroni/blob/master/docs/releases.rst#version-300

andrewlecuyer commented 8 months ago

As noted above, this has been addressed with the "failsafe" functionality that is now available in Patroni:

https://patroni.readthedocs.io/en/master/dcs_failsafe_mode.html

And considering the latest versions of Crunchy Postgres for Kubernetes also include this change, proceeding with closing this issue.

CrunchyData / postgres-operator