BCDevOps / platform-services

Collection of platform related tools and configurations
Apache License 2.0
13 stars 29 forks source link

OCP4 Patroni Reliability and Footprint Enhancements #732

Open jujaga opened 3 years ago

jujaga commented 3 years ago

Description

This PR updates the example Patroni OpenShift template with the following enhancements:

Types of changes

Bug fix (non-breaking change which fixes an issue)

Further comments

On OCP4, Patroni no longer automatically does a failover of master when it fails to a replica. We now do this manually by asking Patroni to force a switchover in the preStop lifecycle. Also in order to allow the Postgres DB a chance to gracefully shut down, we remove terminationGracePeriodSeconds: 0 so that it defaults to the k8s 30 seconds.

Also upon further observation of the Patroni cluster, the /health_check.sh script uses a non-insignificant amount of CPU. As this was being called in the readinessProbe, a Patroni pod would normally average around 0.040 cores (the script checks the remaining PVC size as well as queestions the Patroni API for status). As most applications only actually care if they can connect to Postgres, we simplify the readinessProbe to just check for that, reducing the average CPU utilization to 0.004 cores, order of magnitude decrease.

Finally, since we know that an idle Patroni and Postgres only averages around 0.004 cores and between 130-180MB of memory, we can reduce the default CPU request to 50 millicores and Memory request to 256Mi, thereby reducing our persistent footprint on the cluster.

basilv commented 3 years ago

FYI I adopted the changes in this PR in my application's OCP4 patroni implementation and was fine with them, particularly liked the automatic failover. (see https://github.com/bcgov/nr-fom-api/tree/release/FOM-12/openshift/db)