Closed meeech closed 1 year ago
After looking at this further (thanks @zachaller) I think here is what happened - hopefully it helps others.
What might have happened:
Re: It working before - I think in some previous cases (this is across many different deployments) that the pod managed to enter a ready state before it was crashing, so we successfully passed step 0, and everything worked as expected.
What could have been done to get alerted about this using rollouts?:
So there are 2 actions taken in this scenario to fix the issue (with configuration - not a rollout issue at all):
progressDeadlineAbort
to true
to ensure the rollout will fail on ProgressDeadlineExceeded
and we get a notif if for some other reason it can't progress.
Checklist:
Hit a weird bug yesterday. Rollouts 1.5.1
A rollout happened, the pod was crashlooping. The problem is the Analysis run never started. I expected the analysis run to execute as it had all the times before
I found this:
Message: ReplicaSet "smoke-test-go-client-service-v1-7f878b67b9" has timed out progressing.
I am unable to reproduce. When I redeployed, it was all good - in that the pod started crashlooping, and the analysis run started and correctly aborted the rollout, and I got a notification.
What I'm trying to understand is:
ProgressDeadlineExceeded
?This is a basic no traffic routing canary strategy rollout.
Attached is the logs from the timeperiod from the rollout controller. let me know if there is any other info to provide.