Closed BChancusi closed 11 months ago
We're also experiencing the same issues on 3.9.1
Can you tell us more about your environment? We're trying to replicate but are not seeing this behavior.
Same issue here on 3.9.1 with EKS 1.27 while trying to rotate the CA by deleting the old CA secret and restarting the apiext
deployment.
The apiext
deployment keeps restarting again and again until we add the missing permission in #5449 to the cluster role.
@cindymullins-dw
Deployed using ArgoCD in EKS on a combination of spot vs on-demand instances. We cleanup everything including the certificate, do a fresh install and after a couple of hours (as pods start shifting around) it randomly starts crashing and any new Emissary instances fail to get past the apiext init container
Same issue here on 3.9.1 with EKS 1.27 while trying to rotate the CA by deleting the old CA secret and restarting the
apiext
deployment.The
apiext
deployment keeps restarting again and again until we add the missing permission in #5449 to the cluster role.
looked at the logs and indeed it seems we've also had occurrences of this error but didn't give much attention to it, will update the rbac tomorrow to see if that fixes it for us, thanks!
Bug is quite insidious, the first pod run errors which then restarts the pod but it seemingly misses the failure line again as its already configured as its first in sequence so is skipped and allows pod to run. #5449 adds the necessary permission thats needed which have confirmed fixes the issue, however, all PRs are currently with build failures atm due to unrelated docker login failure
the first pod run errors which then restarts the pod but it seemingly misses the failure line again as its already configured as its first in sequence so is skipped and allows pod to run.
@BChancusi Yeah, exact the situation we hit while trying to rotate the CA (delete the CA secret and restart the apiext
deployment) multiple times.
Describe the bug
Applying Emissary as yaml and waiting on the deployment fails.
Caused by the api-ext pod restarting post being flagged as ready/passing the wait check.
After the restart the pod runs normally.
To Reproduce Steps to reproduce the behavior:
Expected behavior api-ext not restarting/ready status hardened so subsequent wait checks are valid.
Versions (please complete the following information):
Additional context Only occurs on latest release, previous 3.8.2 works correctly. Currently working around by using a sleep post wait to ensure the pod has had time to restart.