actions / actions-runner-controller

Kubernetes controller for GitHub Actions self-hosted runners
Apache License 2.0
4.76k stars 1.12k forks source link

PDB / node-scale prevention while workflow pods are running #3814

Open jonathan-fileread opened 2 days ago

jonathan-fileread commented 2 days ago

What would you like added?

I noticed while multiple runnerset workflow pods are running (i.e. a job that requires 6 runners), that if 4 complete, and if node utilization threshold is set to 0.5 (for example) and requests are now at 40% because 4 workflow jobs completed), node scaledowns can potentially scale down jobs that are still in progress. Since ARC / GH is not idempotent, the jobs could suddenly fail. Is it possible to apply a finalizer / PDB to the workflow pods (spun up by container.mode = kubernetes for docker in docker build)

Let me know if there is current feature to allow for graceful termination to prevent node scaledowns from affecting workflow pods.

Note: Feature requests to integrate vendor specific cloud tools (e.g. awscli, gcloud-sdk, azure-cli) will likely be rejected as the Runner image aims to be vendor agnostic.

Why is this needed?

We dont want individual jobs within a GHA matrix build to suddenly fail due to node scaledown

Additional context

Add any other context or screenshots about the feature request here.