Closed nahsi closed 3 years ago
File is the worst choice for persistent metrics storage in containerizing environments, any external storage add dependency and complexity for support
I prefer the combination of following backup alerts
- alert: ClickhouseBackupDoesntRunTooLong
expr: |-
(clickhouse_backup_last_backup_end > 0 and time() - clickhouse_backup_last_backup_end > 129600)
or (clickhouse_backup_last_create_finish > 0 and time() - clickhouse_backup_last_create_finish > 129600)
or (clickhouse_backup_last_upload_finish > 0 and time() - clickhouse_backup_last_upload_finish > 129600)
- alert: ClickHouseRemoteBackupSizeZero
for: "36h"
expr: clickhouse_backup_last_backup_size_remote == 0
for details
Thank you for the link!
Regarding external storage and complexity, since we already have to mount /var/lib/clickhouse
inside the container we can store a file there, just like clickhouse-backup stores metadata.json
in /var/lib/clickhouse/backup/*/
.
But I get your point, persistent metrics is just nice feature, and if it will add complexity it is better not to do it.
It would be nice if clickhouse-backup could somehow preserve metrics between restarts. This would help with alerting:
Imagine alerting on last backup date:
Currently it will trigger after clickhouse-backup restart since all metrics will be reinitialized. I can change alert expression to something like
expr: time() - clickhouse_backup_last_create_start > 172800 < 31557600
as a workaround, but still... Having statistics about backup in grafana dashboard is also nice.Maybe statistics can be saved to a file and loaded at start if file is present? And if not file will be created and initialized with zeroes.