Add back-up for Traction deployments dev/test/prod

esune commented 1 year ago

Use the backup-container to enable back-ups for dev/test/prod Traction instance.

esune commented 11 months ago

Might affect or be affected by https://github.com/bcgov/DITP-DevOps/issues/108, probably good to keep that in mind.

i5okie commented 11 months ago

In the current implementation of Traction, we use a single-instance PostgreSQL server. The bitnami sub-chart for postgresql we use does not offer any backup solution.

Many projects use the backup-container to backup their databases. Configuration of which requires specifying the database server name, and the database name. Which mostly is a "set it and forget it" type deal. With Traction configured in multi-tenancy mode, and database per wallet adds a bit of an overhead, however. backup-container does not support dynamic backup. Meaning that with every change of tenants in each environment will require somebody to monitor these changes, update the backup configuration in the config repo, and make sure that change gets applied on the backup-container in that environment. This is probably acceptable in dev and test environments. But not ideal for production, as it leaves room for human error.

I'd like to suggest looking into using CrunchyDB PostgreSQL clusters for dev/test/prod environments of Traction. CrunchyDB Postgres operator is made available on the cluster and is maintained by the platform team.

Documentation available here: Open source database technologies CrunchyDB Workshop

Implementing the CrunchyDB cluster would address both, database backup, and high-availability concerns.

Kubernetes Operators, such as the CrunchyDB PostgreSQL operator monitors namespaces for CRD (custom resource definition) objects. In the case of PGO (crunchydb postgres operator) both, the postgresql server and backup configurations are defined in a single CRD file. Just like a Kubernetes Deployment manifest tells Kubernetes how many pods to deploy, and how to configure each one of them. A postgres cluster CRD will tell PGO how to deploy and configure a postgresql cluster. PGO then does the rest. It automatically monitors and manages the cluster, and backups if configured.

Both, testing, and potentially, implementation could be as simple as setting up a crunchydb cluster, for example in the dev environment. Traction Helm Chart already supports disabling deployment of the sub-chart PostgreSQL server with the substitution of the connection and authentication details for an external postgresql server. After deploying a CrunchyDB cluster, we will need to configure the necessary networking policies; followed by updating the development helm override values to point Traction to the crunchydb's cluster.

Though not necessary, we could add the PGO CRD definition to the Traction Helm Chart, as an option for the end-user to choose instead of the Bitnami PostgreSQL sub-chart.

WadeBarnes commented 11 months ago

Sounds good to me. I know other teams are using it successfully. I'm interested in how well the backup and restore features work. Great opportunity to try this out.

esune commented 11 months ago

Thank you for the background information @i5okie . I like your recommendation and look forward to the results of the investigation/implementation.

A couple of things to note:

We will need to migrate the current wallet instance(s) to the Crunchy database, no data loss or tenant resets at this point
We might want to stage the HA part of this for a second step, in order to reduce the number of changes and because of intricacies with synchronizing data across replicas with ACA-Py/crypto - see https://github.com/bcgov/DITP-DevOps/issues/124#issuecomment-1732025239

i5okie commented 11 months ago

Deployed backup-container to dev and test environments.

Found some issues with the Helm chart I have to fix before I can move forward with testing crunchy postgres cluster.

esune commented 10 months ago

Slight change of plans: we will use CrunchyDB soon enough to not make deploying teh backup-container to production necessary, since we have no tenants there yet.

Closing the issue as completed, the backups in dev/test will be used to restore the data when completing #133

bcgov / DITP-DevOps

Add back-up for Traction deployments dev/test/prod #107