DataDog / extendeddaemonset

Kubernetes Extended Daemonset controller
Apache License 2.0
98 stars 13 forks source link

Add telemetry to detect stuck rolling updates #89

Closed ahmed-mez closed 3 years ago

ahmed-mez commented 3 years ago

What does this PR do?

Add a new metric ers_rolling_update_stuck: 1 if the number of unavailable pods is higher than maxUnavailable, 0 otherwise.

Motivation

To be alerted if an ERS rolling update get stuck because of the number of unavailable pods in the cluster.

Describe your test plan

ahmed-mez commented 3 years ago

Thanks @celenechang ! PR updated