Open solracsf opened 2 years ago
@solracsf Yeah, I like that idea. I've used deadmanssnitch which is similar. It'd probably be good to batch up and only call the API every 5 minutes or hour or something.
For sure. By having a configurable time (why not with 1m being the lower limit), one could be sure the process is up and running at conveniente times. Once you have an API for that, it could be easily integrated with a lot of similar "ping" services (Cronitor, Cronhub, Pagerduty, etc...).
An alternative way would be to use metrics scraping and to make sure enough metrics are exposed to detect error conditions. Possibly just tracking litestream_replica_wal_bytes
is enough to track if the process is replicating successfully depending how much writes you are expecting to happen.
Sorry if this is already in place, couldn't find anything on it. As a replication service, one should monitor it (as always).
One possibility is to integrate it with any "ping" service like https://healthchecks.io/ or similar so the service can be ping'ed every time sync (replication) has been done successfully.
Something like this https://torsion.org/borgmatic/docs/how-to/monitor-your-backups/#healthchecks-hook