This repository contains the Spilo + pgtap docker container and helm chart used by the various applications deployed by the CAS team.
The docker container adds the pgtap
extension (unit testing for PostgreSQL) and a couple of utility scripts to the official Spilo container developed by Zalando.
BATS tests for the aforementioned scripts can be run using the corresponding make
commands (run make help
for more information).
The helm/patroni
folder contains a fork of the patroni helm chart that was in the (now deprecated) helm charts incubator, and can be reused by other teams.
The helm/cas-postgres
folder contains helm values used by the CAS team.
In order to use WALG backups, you will need to pass in storage bucket names and service accounts that have already been provisioned. You can do this utilizing the
terraform-bucket-provision
chart fromcas-pipeline
.
Installation of the cas-postgres helm chart is simply done via the helm install
command. See helm/cas-postgres/values.yaml
for more information on the values that should be provided
ex: helm delete --namespace wksv3k-dev cas-ciip-portal
(can be done from the openshift ui under storage)
(can be done from the openshift ui under Resources -> Configmaps)
Resources -> Other Resources
, if you find the job, delete it.ex: wksv3k-dev-ciip-backups
In the event a database needs to be recovered from a backup to a specific point in time, these are the steps to follow. Spilo is shipped with a clone_with_wale method that we can leverage to handle this for us with the addition of a few environment variables.
These steps assume you are using Google Cloud to store backups. If you are using something else (like S3), the process should be similar for other providers and the necessary environment variables are described in the configure_spilo
script here.
Scale the patroni statefulset down to 0
Delete the patroni configmaps (any configmaps prefixed by your patroni-cluster-name)
Delete the patroni PVCs relating to your cluster
Add the following environment variables to your patroni statefulset:
CLONE_SCOPE
: \<patroni-cluster-name>CLONE_METHOD
: CLONE_WITH_WALECLONE_TARGET_TIME
: \<timestamp-with-timezone-to-recover-to>CLONE_WALG_GS_PREFIX
: \<google-cloud-prefix>CLONE_GOOGLE_APPLICATION_CREDENTIALS
: \<path-to-json-credentials>PGVERSION
: \<major-postgres-version-to-restore-to> example: 12PGVERSION
is optional, but if the major version of your postgres backup is older than the psql version you are using, then it will automatically upgrade Posgtgres during restore and begin an entirely new timeline starting at 00000001. This will cause issues with replication as the replica will become confused about what timeline to bootstrap from when starting up.Note: You likely already have WALG_GS_PREFIX
and GOOGLE_APPLICATION_CREDENTIALS
set as environment variables since they're needed to perform backups. The clone_with_wale
method specifically looks for these variables with the CLONE_
prefix, so just copying the contents of these existing environment variables into new variables prefixed with CLONE_
is all that is needed here.
Do not forget to save your environment variables. The SAVE button is at the bottom of the page
Scale up the patroni statefulset
Patroni will then start up your database and restore to the point defined in CLONE_TARGET_TIME
.
If successful you will see something like this in your logs: (docs/images/success.png)
Patroni will begin bootstrapping your backup. You will see a lot "Not Healthy..bootstrap in progress..rejecting connections" in your logs for a while as shown below. This is normal, do not stop the process.
(docs/images/bootstrapping.png)
This will repeat for a few moments. Once the bootstrap is finished you'll see a bunch of SQL statements followed eventually by your pod declaring it is the leader with the lock
. At this point you're all done!
Esure you are in the correct namespace!
oc project
will show you what namespace you are inoc scale statefulset <statefulset-name> --replicas=0
oc get statefulsets
oc delete configmap <configmap-name>
oc get configmaps
oc delete pvc <pvc-name>
oc get pvc
oc set env statefulset/<statefulset-name> CLONE_SCOPE=<patroni-cluster-name> CLONE_METHOD=CLONE_WITH_WALE CLONE_WALG_GS_PREFIX=<google-cloud-prefix> CLONE_GOOGLE_APPLICATION_CREDENTIALS=<path-to-credentials> CLONE_TARGET_TIME='<target-restore-time-with-timezone>' PGVERSION=<major-postgres-version-to-restore-with>
oc set env statefulset/<statefulset-name> --list
< >
are above in the steps sectionoc scale statefulset <statefulset-name> --replicas=<n>
with 'n' being desired number of pods.
Scale your statefulset to 0
Workloads
in the left pane, go to StatefulSets
(docs/images/ss.png)Delete configmaps
Workloads
in the left pane, go to ConfigMaps
(docs/images/config.png)Delete PVCs
Storage
in the left pane, go to PersistentVolumeClaims
(docs/images/pvc.png)Set environment variables
Workloads
in the left pane, go back to StatefulSets
(docs/images/ss.png)Environment
tabSteps
section aboveScale up your statefulset
Details
tab & use the arrows to scale up your statefulset