actions / actions-runner-controller

Kubernetes controller for GitHub Actions self-hosted runners
Apache License 2.0
4.4k stars 1.04k forks source link

ARC controller dind mode failure: dial unix /run/docker/docker.sock: connect: permission denied #3537

Closed aiell0 closed 1 month ago

aiell0 commented 1 month ago

Checks

Controller Version

0.9.2

Deployment Method

Helm

Checks

To Reproduce

Run a github action with a job configured as follows (I get this error when I run other job containers as well):

name: Checks

on:
  pull_request:
    types: [opened, synchronize]
    branches: [ "main" ]

env:
  CARGO_TERM_COLOR: always

jobs:
  build:
    runs-on: custom-runners
    container: rust:1.78-alpine3.18
    steps:
      - uses: actions/checkout@v4
      - name: Add rustfmt
        run: rustup component add rustfmt
      - name: Format check
        run: cargo fmt --check
      - name: Check
        run: cargo check --verbose

Describe the bug

Get the following error when running the job: permission denied while trying to connect to the Docker daemon socket at unix:///run/docker/docker.sock: Get "http://%2Frun%2Fdocker%2Fdocker.sock/v1.44/version": dial unix /run/docker/docker.sock: connect: permission denied

Describe the expected behavior

Job runs successfully

Additional Context

Running on Amazon EKS v1.29 Instance: t3a.2xlarge Amazon Linux AMI

"controllerServiceAccount":
                          "name": "github-action-runner-controller-gha-rs-controller"
                          "namespace": "gha-runner"
                        "githubConfigSecret":
                          "github_token": "<redacted>"
                        "githubConfigUrl": "https://github.com/Rogo-Technologies"
                        "listenerTemplate":
                          "spec":
                            "containers":
                            - "name": "listener"
                            "tolerations":
                            - "effect": "NoSchedule"
                              "key": "build-agent"
                              "operator": "Equal"
                              "value": "true"
                        "maxRunners": 10
                        "minRunners": 3
                        "runnerGroup": "eks-shared-services"
                        "runnerScaleSetName": "<redacted>"
                        "template":
                          "spec":
                            "containers":
                            - "command":
                              - "/home/runner/run.sh"
                              "env":
                              - "name": "DOCKER_HOST"
                                "value": "unix:///run/docker/docker.sock"
                              "image": "ghcr.io/actions/actions-runner:latest"
                              "name": "runner"
                              "volumeMounts":
                              - "mountPath": "/home/runner/_work"
                                "name": "work"
                              - "mountPath": "/var/run"
                                "name": "dind-sock"
                            - "args":
                              - "dockerd"
                              - "--host=unix:///run/docker/docker.sock"
                              - "--group=$(DOCKER_GROUP_GID)"
                              "env":
                              - "name": "DOCKER_GROUP_GID"
                                "value": "123"
                              "image": "docker:dind"
                              "name": "dind"
                              "securityContext":
                                "privileged": true
                              "volumeMounts":
                              - "mountPath": "/var/run"
                                "name": "dind-sock"
                              - "mountPath": "/home/runner/externals"
                                "name": "dind-externals"
                              - "mountPath": "/home/runner/_work"
                                "name": "work"
                            "initContainers":
                            - "command":
                              - "cp"
                              - "-r"
                              - "-v"
                              - "/home/runner/externals/."
                              - "/home/runner/tmpDir/"
                              "image": "ghcr.io/actions/actions-runner:latest"
                              "name": "init-dind-externals"
                              "volumeMounts":
                              - "mountPath": "/home/runner/tmpDir"
                                "name": "dind-externals"
                            "tolerations":
                            - "effect": "NoSchedule"
                              "key": "build-agent"
                              "operator": "Equal"
                              "value": "true"
                            "volumes":
                            - "emptyDir": {}
                              "name": "work"
                            - "emptyDir": {}
                              "name": "dind-sock"
                            - "emptyDir": {}
                              "name": "dind-externals"

Controller Logs

https://gist.github.com/aiell0/72584538cd5ddb944dfdb8a918ef7424

Runner Pod Logs

https://gist.github.com/aiell0/f9e89622c39c2659976fdddeb7e68446
github-actions[bot] commented 1 month ago

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

aiell0 commented 1 month ago

Upon looking closer at the runner logs, I keep seeing the following: System.UnauthorizedAccessException: Access to the path '/proc/90/oom_score_adj' is denied. This leads me to believe there is am OOM error killing my runners. Are there any resource requirements that are necessary for these to run that are not in the documentation?

aiell0 commented 1 month ago

Turned out to be a misconfiguration on my part. run/docker/docker.sock is supposed to be var/run/docker.sock.