canonical / postgresql-operator

A Charmed Operator for running PostgreSQL on machines
https://charmhub.io/postgresql
Apache License 2.0
8 stars 19 forks source link

Last successfull backup should be exposed via a metric #272

Open ben-ballot opened 11 months ago

ben-ballot commented 11 months ago

Steps to reproduce

  1. Deploy postgresql charm
  2. Have a backup run and take up most of the space
  3. Next backup job would fail because of lack of space
  4. No metric would be actionable to raise an alert that the last backup is more than "x" days old or that the script failed. There isn't necessarily a correlation with a disk space alert, as the script calculate the estimated size of the future backup based on the last one, and would skip the backup if the remaining space is less than the anticipated size of the backup.

Expected behavior

Have a scrape-able metric for:

Actual behavior

Versions

Operating system: Ubuntu 18.04.6 LTS

Juju CLI: 2.9.42-ubuntu-amd64

Juju agent: 2.9.44

Charm revision: latest/stable

github-actions[bot] commented 11 months ago

https://warthogs.atlassian.net/browse/DPE-2896

taurus-forever commented 11 months ago

Dear @darkalia thank you for the improvements request.

At the moment all the failed backups are marked the special prefix: "Backup failed" and all logs are forwarded to Loki where you can configure the Alert rule using the prefix.

It is an insetting idea to expose the last backup id as a metric, but today there is no easy way to implement it. Contact @7annaba3l to have it scheduled. Tnx!

mthaddon commented 11 months ago

Wouldn't a successful backup create an entry in the juju debug-log which would be relatively easy to inspect via Loki?

mthaddon commented 11 months ago

For instance, looking at the juju logs from a unit we're running in production I see:

2023-11-21 00:00:36 INFO unit.postgresql/0.juju-log server.go:325 Backup succeeded: with backup-id 2023-11-21T00:00:02Z
taurus-forever commented 10 months ago

Just for the history: both successful and failed backup will make a trace in debug-log with INFO level (and available on Loki COS).

The idea to show backup metrics is great. We need Prometherus pgbackrest exporter first. I will keep the issue open.