Closed sungmincs closed 6 months ago
Hello! Thank you for filing an issue.
The maintainers will triage your issue shortly.
In the meantime, please take a look at the troubleshooting guide for bug reports.
If this is a feature request, please review our contribution guidelines.
Closing this one as a duplicate of https://github.com/actions/actions-runner-controller/issues/3450
Checks
Controller Version
0.9.1
Deployment Method
Helm
Checks
To Reproduce
Describe the bug
I have a dedicated arc-runners nodepool to build the arm64 workloads, and the the pool count is 0 until someone tries to use the arm64 runner. When someone launches a new actions workflow that runs on the arm64 runner, the controller and the listener are quick enough to assign the new runner pod in the arm64 nodepool. Now, the autoscaler (karpenter in my case) kicks off to assign a new node, and this takes roughly 30~40secs and the runner pod becomes the init state (pulling runner image) from Pending state. After in total of about 50 secs since the controller assigns the new pod, it starts killing the runner pod that is still in init state.
I also tested the same scenario with a node that is already running, but this case also failed because the runner image pull wasn't fast enough (~30sec) for the runner pod to be ready. Here the runner image I used was just the default
Pulling image "ghcr.io/actions/actions-runner:latest"
not anything big that I customized.Describe the expected behavior
The controller should be more patient to wait for the runner pod to be ready, or there should be wait timeout / retry configuration so the users can apply how much time they can tolerate in case of node scaling scenario.
Additional Context
Controller
Runner
Controller Logs
Runner Pod Logs