Closed rteeling-evernorth closed 7 months ago
Hello! Thank you for filing an issue.
The maintainers will triage your issue shortly.
In the meantime, please take a look at the troubleshooting guide for bug reports.
If this is a feature request, please review our contribution guidelines.
Hey @rteeling-evernorth,
This issue is related to the hook :relaxed:. Are you using default hook implementation in your container mirror? If so, job schedules the pod to run on the same node where the runner is. If so, the problem is with the node capacity, not with the scheduler. By default, we are skipping the scheduler so we can use the volume mount from the runner pod. This can be avoided in case you use ReadWriteMany volumes, but would require you to configure envs appropriately.
Ah! That would explain it. Everything in my mirror is off-the-shelf for 0.8.2. I was using the default volume mount in the values file which is ReadWriteOnce
. This would compel the behavior I am seeing. Thank you so much for the info!
You are welcome!
Checks
Controller Version
0.7.0,0.8.2
Deployment Method
ArgoCD
Checks
To Reproduce
Describe the bug
When the k8s job pod tries to run, the k8s cannot find a node to schedule and throws the following event/error:
Node didn't have enough resource: cpu, requested: 2000, used: 13920, capacity: 15890
The K8S Job has the following error on it:
Job has reached the specified backoff limit
This causes the Actions job to fail
Describe the expected behavior
Job pod should wait for new nodes to come online to schedule (average: 45 seconds)
Additional Context
Controller Logs
Runner Pod Logs