actions / runner

The Runner for GitHub Actions :rocket:
https://github.com/features/actions
MIT License
4.64k stars 892 forks source link

Runners in a stuck state after the Actions outage #3334

Open Scalahansolo opened 2 weeks ago

Scalahansolo commented 2 weeks ago

Checks

Controller Version

0.9.1

Deployment Method

Helm

Checks

To Reproduce

Cannot repro as this only happened due to the outage

Describe the bug

After the Actions outage yesterday, all of the runners in my runner group ended up in the following state. In the Github UI, it says this runner has an active job which is just a failed job due the outage.

CleanShot 2024-06-12 at 09 19 17@2x

The logs of the actual runner seem fine, and it's just waiting to be assigned a job properly.

CleanShot 2024-06-12 at 09 21 44@2x

Describe the expected behavior

I would have expected these failed jobs to not be listed as "active" in my runners. Im guessing because these failed jobs are still marked as active in by Github, new jobs are not being assinged to these runners.

Additional Context

N/A

Controller Logs

N/A

Runner Pod Logs

N/A
nikola-jokic commented 2 weeks ago

Transferred the issue here since it is related to the runner itself, and not ARC.

Scalahansolo commented 2 weeks ago

As a quick update here. The only way I could get these runners healthy again was I had to track down all those "Active Jobs" that were in the failed state (this took forever), and use the Github API to hard delete those runs out of Github. Once I deleted all of those, after a bit Github started to see those runners as idle and started to assign new jobs.