Closed kwngo closed 3 months ago
Looks like this may have been a consequence of the Github wide issue with the API and Github Actions
Im not sure if the issues I saw today are related to this, but after the outage today, all of my runners got stuck in a bad state and have since been completely unrecoverable. I had to create an entirely new runner set to try and get CI back up and running properly
Hey everyone, I'm fairly certain this one was due to an incident, so I will close it now. Please let us know if the issue persists :relaxed:
Checks
Controller Version
0.7.0
Deployment Method
Helm
Checks
To Reproduce
We're seeing issues after runner pods have been running for a long time, where the listener seems to fail and terminates/fails to bring up new runner pods. The error we're seeing in the listener is below which seems to point to an issue refreshing the messagequeue token. It seems to require us to delete the full helm release and re-install to get things working again.
Describe the bug
Runner scale set listener fails to refresh and gets stuck.
Describe the expected behavior
Runner listener should be able to refresh its client
Additional Context
Controller Logs
Runner Pod Logs