actions / actions-runner-controller

Kubernetes controller for GitHub Actions self-hosted runners
Apache License 2.0
4.42k stars 1.04k forks source link

All runners stuck in "Init" #3373

Closed patrickblackjr closed 3 months ago

patrickblackjr commented 3 months ago

Checks

Controller Version

0.8.3

Deployment Method

ArgoCD

Checks

To Reproduce

Unsure of how to reproduce as this seemed to occur randomly without any changes from my end.

Describe the bug

2024-03-20-13-29-18-000187

I have dozens of pods stuck in "Init" status for multiple hours at a time. No errors in either controller pods logs.

Describe the expected behavior

Run as normal

Additional Context

githubConfigUrl: mygithub
        githubConfigSecret: github-arc-secret
        controllerServiceAccount:
          namespace: arc
          name: gha-arc-controller-gha-rs-controller
        template:
          spec:
            initContainers:
              - name: init-dind-externals
                image: us-docker.pkg.dev/custom:latest
                command:
                  [
                    "cp",
                    "-r",
                    "-v",
                    "/home/runner/externals/.",
                    "/home/runner/tmpDir/",
                  ]
                volumeMounts:
                  - name: dind-externals
                    mountPath: /home/runner/tmpDir
            containers:
              - name: runner
                image: us-docker.pkg.dev/custom:latest
                command: ["/home/runner/run.sh"]
                env:
                  - name: DOCKER_HOST
                    value: unix:///run/docker/docker.sock
                volumeMounts:
                  - name: work
                    mountPath: /home/runner/_work
                  - name: dind-sock
                    mountPath: /run/docker
                    readOnly: true
              - name: dind
                image: docker:24.0.9-dind@sha256:51a2a0a9f8d15d455584ea054a602ea58917a78c34df2261b0695a60f0a6ae61
                # FIX: https://github.com/actions/actions-runner-controller/issues/3159
                # CAUSE: https://github.com/docker-library/docker/issues/463
                args:
                  - dockerd
                  - --host=unix:///run/docker/docker.sock
                  - --group=$(DOCKER_GROUP_GID)
                env:
                  - name: DOCKER_GROUP_GID
                    value: "123"
                  - name: DOCKER_IPTABLES_LEGACY
                    value: "1"
                securityContext:
                  privileged: true
                volumeMounts:
                  - name: work
                    mountPath: /home/runner/_work
                  - name: dind-sock
                    mountPath: /run/docker
                  - name: dind-externals
                    mountPath: /home/runner/externals
                  - name: daemon-json
                    mountPath: /etc/docker/daemon.json
                    subPath: daemon.json
                    readOnly: true
            volumes:
              - name: work
                emptyDir: {}
              - name: dind-sock
                emptyDir: {}
              - name: dind-externals
                emptyDir: {}
              - name: daemon-json
                configMap:
                  name: docker-daemon-config

Controller Logs

https://gist.github.com/patrickblackjr/c8894d94dbfd797585535108df07c04b

Runner Pod Logs

https://gist.github.com/patrickblackjr/c8894d94dbfd797585535108df07c04b
github-actions[bot] commented 3 months ago

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

nikola-jokic commented 3 months ago

Hey @patrickblackjr, could you please run kubectl describe on them to see what is the problem?