After the deployment is done, trigger a rolling update that will cause the new pods to be available,. To do so, a broken readiness probe can be added. This will make the EDS consider the new pod unavailable and should throttle/stuck the rolling update
Make sure the metric shows that the new ers is stuck
$ curl localhost:8080/metrics | grep ers_rolling_update_stuck
# HELP ers_rolling_update_stuck 1 if the number of unavailable pods is higher than maxUnavailable, 0 otherwise
# TYPE ers_rolling_update_stuck gauge
ers_rolling_update_stuck{name="pause-containers-67zbv",namespace="default"} 0
ers_rolling_update_stuck{name="pause-containers-vpkxk",namespace="default"} 1
What does this PR do?
Add a new metric
ers_rolling_update_stuck
:1
if the number of unavailable pods is higher than maxUnavailable,0
otherwise.Motivation
To be alerted if an ERS rolling update get stuck because of the number of unavailable pods in the cluster.
Describe your test plan