Currently heartbeat notifications tend to all be fired at the same time, especially as the server gets stopped and restarted during backup. This causes problem with Grafana alerts that detect when increase(heartbeat_notifications_total[20m]) becomes zero. With 20m notification interval this metric is sometimes zero when backup job that runs every day stops notification service right before all notifications are supposed to be fired.
Implementing proper pacing is too much work as we don't need to perfectly distribute the notifications over time, but adding random delay should solve the problem.
Currently heartbeat notifications tend to all be fired at the same time, especially as the server gets stopped and restarted during backup. This causes problem with Grafana alerts that detect when
increase(heartbeat_notifications_total[20m])
becomes zero. With 20m notification interval this metric is sometimes zero when backup job that runs every day stops notification service right before all notifications are supposed to be fired.Implementing proper pacing is too much work as we don't need to perfectly distribute the notifications over time, but adding random delay should solve the problem.