bcgov / MFIN-Data-Catalogue

The Finance Data Catalogue enables users to discover data holdings at the BC Ministry of Finance and offers information and functionality that benefits consumers of data for business purposes. The product is built using Drupal and adheres to the Government of BC's Core Administrative and Descriptive etadata Standard.
Other
6 stars 0 forks source link

Investigate the database backup cron and possible rotation. #497

Closed NicoledeGreef closed 3 months ago

NicoledeGreef commented 3 months ago

PVC database backup is nearly full. There are some not well known aspects of the database backups so this task is for devOps specialist to investigate and report out to the team via this ticket.

@danhgov

kardamk commented 3 months ago

Hi @chrislaick and @danhgov,

There are 2 aspects to database backups and retention which are done through pgbackrest via CrunchDB postgres cluster resource that is created from the template present at "tenant-gitops-ea352d/helm-drupal/charts/drupal/templates/postgrescluster/PostgresCluster.yaml"

  1. Data Retention: We need to tune how many days of backup we need to maintain.
    • Currently, data retention is time based dependent upon "postgresOperator.retention.count" and "postgresOperator.retention.type" parameters of "tenant-gitops-ea352d/helm-drupal/charts/drupal/values-mfin-data-catalogue-{NAMESPACE}.yaml" specifying the number of days for the backup to be retained. For example, if "postgresOperator.retention.count" is 30 (days) and there are 2 full backups: one 25 days old and one 35 days old, no full backups will be expired because expiring the 35 day old backup would leave only the 25 day old backup, which would violate the 30 day retention policy of having at least one backup 30 days old before an older one can be expired.

P.S : When a full backup expires, all differential and incremental backups associated with the full backup will also expire.

  1. Data Backups: Whether we need to change the frequency of full and incremental backups being taken currently?
    • Backup schedules are dependent upon "postgresOperator.retention.schedules.full" and "postgresOperator.retention.schedules.incremental" parameters of "tenant-gitops-ea352d/helm-drupal/charts/drupal/values-mfin-data-catalogue-{NAMESPACE}.yaml"
    • CrunchyDB operator creates the following cron jobs to take a backup based on the specified schedules
    • mfin-data-catalogue-postgres-cluster-repo1-full : Creates full backup of database at 8 A.M. everyday
    • mfin-data-catalogue-postgres-cluster-repo1-incr : Creates incremental backup of database at midnight, 4 AM, 12 PM (noon), 4 PM, and 8 PM every day.

References:

  1. https://developer.gov.bc.ca/docs/default/component/platform-developer-docs/docs/database-and-api-management/postgres-how-to/
  2. https://access.crunchydata.com/documentation/postgres-operator/latest/architecture
  3. https://pgbackrest.org/user-guide-rhel.html#introduction

Thank you!

danhgov commented 3 months ago

Thanks for your detective work on this, @kardamk! This is doing a lot to move this along.

kardamk commented 3 months ago

As discussed, retention period has been changed from 30 days to 7 days for DB backups. Will be monitoring backup PVC for fine-tuning the DB retention period if required.

@chrislaick : Requesting you to merge the pull request https://github.com/bcgov-c/tenant-gitops-ea352d/pull/12 to the main branch after reviewing.

chrislaick commented 3 months ago

@kardamk Merged to main and synced with PROD.

chrislaick commented 3 months ago

After adjusting the retention period from 30 days to 7 days, the cronjob successfully pruned the backups and the PVC has gone from 91% > 25% (3.6GiB > 1.0 GiB). I think this ticket can be closed. image

@kardamk @NicoledeGreef @danhgov