Azure / prometheus-collector

Other
65 stars 37 forks source link

container label used in cpu usage alert does not exist #336

Closed abdullah248 closed 1 year ago

abdullah248 commented 1 year ago

The recommended Alert for CPU Usage in https://github.com/Azure/prometheus-collector/blob/main/mixins/kubernetes/rules/recording_and_alerting_rules/templates/ci_recommended_alerts.json uses the wrong labels.

The query is: "sum (rate(container_cpu_usage_seconds_total{image!=\"\", container_name!=\"POD\"}[5m])) by (pod,cluster,container,namespace) / sum(container_spec_cpu_quota{image!=\"\", container_name!=\"POD\"}/container_spec_cpu_period{image!=\"\", container_name!=\"POD\"}) by (pod,cluster,container,namespace) > .95",

The label being used in the query is container_name but when I use grafana to explore the data I only see a label for container and a seprate label for name. I believe this one should be using container.

I did not check the other alerts but may be a good idea to go through the rest of them as well.

Sohamdg081992 commented 1 year ago

The alert has been fixed.