actions / actions-runner-controller

Kubernetes controller for GitHub Actions self-hosted runners
Apache License 2.0
4.76k stars 1.12k forks source link

gha-runner-scale-set-controller error - "container <autoscaler> has no resource limits" #2612

Closed mikeclayton closed 1 year ago

mikeclayton commented 1 year ago

What would you like added?

Although the gha-runner-scale-set-controller Helm chart allows configuration of resource limits in the values.yaml file, these settings are only applied to the controller pod and aren't honoured when the controller tries to start a listener pod.

This results results in the following error if the hosting Azure Kubernetes Service is configured to require resource limits via a system Policy:

admission webhook \"validation.gatekeeper.sh\" denied the request: [azurepolicy-k8sazurev3containerlimits-2875548f79e717d03827] container has no resource limits

The full log in the controller pod is:

> kubectl logs gha-runner-scale-set-controller-5b74b856bd-tn6sk -n gha-runner-scale-set-controller

... snip ...
2023-05-22T10:52:08Z    INFO    AutoscalingListener     Creating a listener pod {"autoscalinglistener": "gha-runner-scale-set-controller/gha-runner-scale-set-test-594f5c8c-listener"}
2023-05-22T10:52:08Z    INFO    AutoscalingListener     Creating listener pod   {"autoscalinglistener": "gha-runner-scale-set-controller/gha-runner-scale-set-test-594f5c8c-listener", "namespace": "gha-runner-scale-set-controller", "name": "gha-runner-scale-set-test-594f5c8c-listener"}
2023-05-22T10:52:08Z    ERROR   AutoscalingListener     Unable to create listener pod   {"autoscalinglistener": "gha-runner-scale-set-controller/gha-runner-scale-set-test-594f5c8c-listener", "namespace": "gha-runner-scale-set-controller", "name": "gha-runner-scale-set-test-594f5c8c-listener", "error": "admission webhook \"validation.gatekeeper.sh\" denied the request: [azurepolicy-k8sazurev3containerlimits-2875548f79e717d03827] container <autoscaler> has no resource limits"}
github.com/actions/actions-runner-controller/controllers/actions%2egithub%2ecom.(*AutoscalingListenerReconciler).createListenerPod
        github.com/actions/actions-runner-controller/controllers/actions.github.com/autoscalinglistener_controller.go:442
github.com/actions/actions-runner-controller/controllers/actions%2egithub%2ecom.(*AutoscalingListenerReconciler).Reconcile
        github.com/actions/actions-runner-controller/controllers/actions.github.com/autoscalinglistener_controller.go:233
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
        sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:122
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:323
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:274
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:235
... snip ...

and the code causing the error is here I think:

https://github.com/actions/actions-runner-controller/blob/91c8991835016f8c6568f101d4a28185baec3dcc/controllers/actions.github.com/resourcebuilder.go#L107

I previously raised an issue (which turned out to be bogus) about resource limits for the controller pod and described why this might be required in strictly governed corporate environments - I closed that particular issue because I hadn't realised you can specify resource limits for the controller pod in the values.yaml, but the rationale and limitation still applies for the listener pod.

A workaround is to exclude the controller's namespace from the Policy, but that may not always be possible in a strictly governed corporate environment, so having a way to configure the resource limits on the listen pod would allow ARC be used in those situations.

Link- commented 1 year ago

I see this as a duplicate of #2578. This can go as a comment into that issue. I'm closing in favour of that one, which is already on our radar.