bcgov / cas-ciip-portal

The Climate Action Secretariat's CleanBC Industrial Incentive Program application
https://ciip.gov.bc.ca/
Apache License 2.0
10 stars 2 forks source link

Add a sysdig alert that monitors pod-restarts #2021

Open dleard opened 3 years ago

dleard commented 3 years ago

We should have an alert that monitors pod-restarts & sends an alert after a few pod restarts happen within a time threshold to alert us of a possible back-off restart loop.

pbastia commented 2 years ago

Experimented with alerts in the form of:

count by(kube_namespace_name, kube_pod_name)
(changes(kube_pod_status_ready{condition="true"}[30m])) >= 5

and tried to alert a pod that would crashloopbackoff by design (a container that runs exit 1 as a command). This didn't work.

cc @dleard