A way to persist metrics after restart

nahsi commented 3 years ago

It would be nice if clickhouse-backup could somehow preserve metrics between restarts. This would help with alerting:

Imagine alerting on last backup date:

  - alert: ClickHouseBackupLastCreate
    expr: time() - clickhouse_backup_last_create_start > 172800
    labels:
      severity: warning
    annotations:
      summary: "Last backup is more than 2 days old at `{{ $labels.instance }}`"
      description: "Last backup created {{ $value | humanizeDuration }} ago at `{{ $labels.instance }}`"

Currently it will trigger after clickhouse-backup restart since all metrics will be reinitialized. I can change alert expression to something like expr: time() - clickhouse_backup_last_create_start > 172800 < 31557600 as a workaround, but still... Having statistics about backup in grafana dashboard is also nice.

Maybe statistics can be saved to a file and loaded at start if file is present? And if not file will be created and initialized with zeroes.

Slach commented 3 years ago

File is the worst choice for persistent metrics storage in containerizing environments, any external storage add dependency and complexity for support

I prefer the combination of following backup alerts

        - alert: ClickhouseBackupDoesntRunTooLong
          expr: |-
            (clickhouse_backup_last_backup_end > 0 and time() - clickhouse_backup_last_backup_end > 129600)
            or (clickhouse_backup_last_create_finish > 0 and time() - clickhouse_backup_last_create_finish > 129600)
            or (clickhouse_backup_last_upload_finish > 0 and time() - clickhouse_backup_last_upload_finish > 129600)

        - alert: ClickHouseRemoteBackupSizeZero
          for: "36h"
          expr: clickhouse_backup_last_backup_size_remote == 0

look to https://github.com/Altinity/clickhouse-operator/blob/master/deploy/prometheus/prometheus-alert-rules-backup.yaml

for details

nahsi commented 3 years ago

Thank you for the link!

Regarding external storage and complexity, since we already have to mount /var/lib/clickhouse inside the container we can store a file there, just like clickhouse-backup stores metadata.json in /var/lib/clickhouse/backup/*/.

But I get your point, persistent metrics is just nice feature, and if it will add complexity it is better not to do it.

Altinity / clickhouse-backup

A way to persist metrics after restart #285