Closed cheskayang closed 1 month ago
Hello! Thank you for filing an issue.
The maintainers will triage your issue shortly.
In the meantime, please take a look at the troubleshooting guide for bug reports.
If this is a feature request, please review our contribution guidelines.
same issue as mentioned in https://github.com/actions/actions-runner-controller/issues/3499 and https://github.com/actions/actions-runner-controller/issues/3420 providing more details on the observed behavior
Closing this one as a duplicate. Thank you for linking it!
Checks
Controller Version
0.9.1
Deployment Method
ArgoCD
Checks
To Reproduce
Describe the bug
Job randomly gets stuck with the msg "Job is waiting for a runner from XXX to come online" cancel the job and rerun will fix it.
Observations: 1: Compared with jobs without issue, for this job gets stuck, there is no job started msgs get received on the listener pod (i.e. it only gets job available, job assigned, and then stuck util cancelled manually..)
for the job gets stuck, the EphemeralRunnerSet gets patched to 1 replica and then immediately gets patched to null, see logs below
the following log seems to occur on the controller side when this issue happens
Describe the expected behavior
job should not stuck
Additional Context
Controller Logs
Runner Pod Logs