docker dind sidecar iptables issue

iamcaleberic commented 7 months ago

Checks

[X] I've already read https://github.com/actions/actions-runner-controller/blob/master/TROUBLESHOOTING.md and I'm sure my issue is not covered in the troubleshooting guide.
[X] I'm not using a custom entrypoint in my runner image

Controller Version

v0.27.6

Helm Chart Version

0.23.6

CertManager Version

1.13.2

Deployment Method

Helm

cert-manager installation

Are you sure youve install cert-manager from an official source? yes using official jetstack helm repo

Checks

[X] This isn't a question or user support case (For Q&A and community support, go to Discussions. It might also be a good idea to contract with any of contributors and maintainers if your business is so critical and therefore you need priority support
[X] I've read releasenotes before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
[X] My actions-runner-controller version (v0.x.y) does support the feature
[X] I've already upgraded ARC (including the CRDs, see charts/actions-runner-controller/docs/UPGRADING.md for details) to the latest and it didn't fix the issue
[X] I've migrated to the workflow job webhook event (if you using webhook driven scaling)

Resource Definitions

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: xxx-runnerdeploy
  namespace: actions-runner-system
spec:
  replicas: 3
  template:
    spec:
      repository: xxx/redacted

To Reproduce

Create a new RunnerDeployment

Describe the bug

The docker dind sidecar errors out and does not start and the runner pods ends up restarting every 120 secs, this is the timeout for docker.

Might be related to

https://github.com/docker-library/docker/commit/4c2674df4f40c965cdb8ccc77b8ce9dbc247a6c9 https://github.com/docker-library/docker/pull/437

Describe the expected behavior

dind sidecar to start.

Whole Controller Logs

manager 2023-12-15T14:24:50Z    DEBUG    controller-runtime.webhook.webhooks    received request    {"webhook": "/mutate-actions-summerwind-dev-v1alpha1-runner", "UID": "14410137-32a5-433f-aeeb-ba1ea3b75b02", "kind": "actions.summerwind.dev/v1alpha1, Kind=Runner", "resource": {"group":"actions.summerwind.dev","version":"v1alpha1","resource":"runners"}}
manager 2023-12-15T14:24:50Z    DEBUG    controller-runtime.webhook.webhooks    wrote response    {"webhook": "/mutate-actions-summerwind-dev-v1alpha1-runner", "code": 200, "reason": "", "UID": "14410137-32a5-433f-aeeb-ba1ea3b75b02", "allowed": true}
manager 2023-12-15T14:24:50Z    DEBUG    controller-runtime.webhook.webhooks    received request    {"webhook": "/validate-actions-summerwind-dev-v1alpha1-runner", "UID": "639578c7-6c69-4e40-ab71-c8dc21eeb9ab", "kind": "actions.summerwind.dev/v1alpha1, Kind=Runner", "resource": {"group":"actions.summerwind.dev","version":"v1alpha1","resource":"runners"}}
manager 2023-12-15T14:24:50Z    INFO    runner-resource    validate resource to be updated    {"name": "repo-provisioner-runnerdeploy-m9v2d-jhlp9"}
manager 2023-12-15T14:24:50Z    DEBUG    controller-runtime.webhook.webhooks    wrote response    {"webhook": "/validate-actions-summerwind-dev-v1alpha1-runner", "code": 200, "reason": "", "UID": "639578c7-6c69-4e40-ab71-c8dc21eeb9ab", "allowed": true}
manager 2023-12-15T14:24:50Z    DEBUG    controller-runtime.webhook.webhooks    received request    {"webhook": "/mutate-actions-summerwind-dev-v1alpha1-runner", "UID": "00c0ff92-0984-4a56-aeba-8ab9e4b948b4", "kind": "actions.summerwind.dev/v1alpha1, Kind=Runner", "resource": {"group":"actions.summerwind.dev","version":"v1alpha1","resource":"runners"}}
manager 2023-12-15T14:24:50Z    DEBUG    controller-runtime.webhook.webhooks    wrote response    {"webhook": "/mutate-actions-summerwind-dev-v1alpha1-runner", "code": 200, "reason": "", "UID": "00c0ff92-0984-4a56-aeba-8ab9e4b948b4", "allowed": true}
manager 2023-12-15T14:24:50Z    INFO    runner    Removed finalizer    {"runner": "actions-runner-system/repo-provisioner-runnerdeploy-m9v2d-jhlp9"}
manager 2023-12-15T14:24:50Z    DEBUG    controller-runtime.webhook.webhooks    received request    {"webhook": "/validate-actions-summerwind-dev-v1alpha1-runner", "UID": "195dc35b-a194-4ba9-9865-dc4fef86cce1", "kind": "actions.summerwind.dev/v1alpha1, Kind=Runner", "resource": {"group":"actions.summerwind.dev","version":"v1alpha1","resource":"runners"}}
manager 2023-12-15T14:24:50Z    INFO    runner-resource    validate resource to be created    {"name": "repo-provisioner-runnerdeploy-m9v2d-lmgwf"}
manager 2023-12-15T14:24:50Z    DEBUG    controller-runtime.webhook.webhooks    wrote response    {"webhook": "/validate-actions-summerwind-dev-v1alpha1-runner", "code": 200, "reason": "", "UID": "195dc35b-a194-4ba9-9865-dc4fef86cce1", "allowed": true}
manager 2023-12-15T14:24:50Z    DEBUG    controller-runtime.webhook.webhooks    received request    {"webhook": "/mutate-actions-summerwind-dev-v1alpha1-runner", "UID": "57ddbdb8-11de-4777-8ad3-26a32f12a01c", "kind": "actions.summerwind.dev/v1alpha1, Kind=Runner", "resource": {"group":"actions.summerwind.dev","version":"v1alpha1","resource":"runners"}}
manager 2023-12-15T14:24:50Z    DEBUG    controller-runtime.webhook.webhooks    wrote response    {"webhook": "/mutate-actions-summerwind-dev-v1alpha1-runner", "code": 200, "reason": "", "UID": "57ddbdb8-11de-4777-8ad3-26a32f12a01c", "allowed": true}
manager 2023-12-15T14:24:50Z    DEBUG    controller-runtime.webhook.webhooks    received request    {"webhook": "/validate-actions-summerwind-dev-v1alpha1-runner", "UID": "861cf2af-c4f4-4853-ac52-c435d65cecd3", "kind": "actions.summerwind.dev/v1alpha1, Kind=Runner", "resource": {"group":"actions.summerwind.dev","version":"v1alpha1","resource":"runners"}}
manager 2023-12-15T14:24:50Z    INFO    runner-resource    validate resource to be updated    {"name": "repo-provisioner-runnerdeploy-m9v2d-7xfgj"}
manager 2023-12-15T14:24:50Z    DEBUG    controller-runtime.webhook.webhooks    wrote response    {"webhook": "/validate-actions-summerwind-dev-v1alpha1-runner", "code": 200, "reason": "", "UID": "861cf2af-c4f4-4853-ac52-c435d65cecd3", "allowed": true}
manager 2023-12-15T14:24:50Z    DEBUG    controller-runtime.webhook.webhooks    received request    {"webhook": "/mutate-actions-summerwind-dev-v1alpha1-runner", "UID": "a407fc97-9e13-47ae-a360-53dde8d4aaa6", "kind": "actions.summerwind.dev/v1alpha1, Kind=Runner", "resource": {"group":"actions.summerwind.dev","version":"v1alpha1","resource":"runners"}}
manager 2023-12-15T14:24:50Z    DEBUG    controller-runtime.webhook.webhooks    wrote response    {"webhook": "/mutate-actions-summerwind-dev-v1alpha1-runner", "code": 200, "reason": "", "UID": "a407fc97-9e13-47ae-a360-53dde8d4aaa6", "allowed": true}
manager 2023-12-15T14:24:50Z    INFO    runner    Removed finalizer    {"runner": "actions-runner-system/repo-provisioner-runnerdeploy-m9v2d-7xfgj"}
manager 2023-12-15T14:24:50Z    DEBUG    controller-runtime.webhook.webhooks    received request    {"webhook": "/validate-actions-summerwind-dev-v1alpha1-runner", "UID": "55a1019b-292e-4f11-8468-d92bf925a562", "kind": "actions.summerwind.dev/v1alpha1, Kind=Runner", "resource": {"group":"actions.summerwind.dev","version":"v1alpha1","resource":"runners"}}
manager 2023-12-15T14:24:50Z    INFO    runner-resource    validate resource to be created    {"name": "repo-provisioner-runnerdeploy-m9v2d-g4j94"}
manager 2023-12-15T14:24:50Z    DEBUG    controller-runtime.webhook.webhooks    wrote response    {"webhook": "/validate-actions-summerwind-dev-v1alpha1-runner", "code": 200, "reason": "", "UID": "55a1019b-292e-4f11-8468-d92bf925a562", "allowed": true}
manager 2023-12-15T14:24:50Z    DEBUG    controller-runtime.webhook.webhooks    received request    {"webhook": "/mutate-actions-summerwind-dev-v1alpha1-runner", "UID": "a0507a08-f8a9-4fbe-8fee-af0557d5c51c", "kind": "actions.summerwind.dev/v1alpha1, Kind=Runner", "resource": {"group":"actions.summerwind.dev","version":"v1alpha1","resource":"runners"}}
manager 2023-12-15T14:24:50Z    DEBUG    controller-runtime.webhook.webhooks    wrote response    {"webhook": "/mutate-actions-summerwind-dev-v1alpha1-runner", "code": 200, "reason": "", "UID": "a0507a08-f8a9-4fbe-8fee-af0557d5c51c", "allowed": true}
manager 2023-12-15T14:24:50Z    DEBUG    controller-runtime.webhook.webhooks    received request    {"webhook": "/validate-actions-summerwind-dev-v1alpha1-runner", "UID": "e9065570-8cad-41c0-bb55-229de37621af", "kind": "actions.summerwind.dev/v1alpha1, Kind=Runner", "resource": {"group":"actions.summerwind.dev","version":"v1alpha1","resource":"runners"}}
manager 2023-12-15T14:24:50Z    INFO    runner-resource    validate resource to be updated    {"name": "repo-provisioner-runnerdeploy-m9v2d-lmgwf"}
manager 2023-12-15T14:24:50Z    DEBUG    controller-runtime.webhook.webhooks    wrote response    {"webhook": "/validate-actions-summerwind-dev-v1alpha1-runner", "code": 200, "reason": "", "UID": "e9065570-8cad-41c0-bb55-229de37621af", "allowed": true}
manager 2023-12-15T14:24:50Z    DEBUG    runnerreplicaset    Created replica(s)    {"runnerreplicaset": "actions-runner-system/repo-provisioner-runnerdeploy-m9v2d", "lastSyncTime": "2023-12-15T14:24:07Z", "effectiveTime": "<nil>", "templateHashDesired": "5646cf87b7", "replicasDesired": 3, "replicasPending": 0, "replicasRunning": 1, "replicasMaybeRunning": 1, "templateHashObserved": ["5646cf87b7"], "created": 2}
manager 2023-12-15T14:24:50Z    DEBUG    runnerreplicaset    Skipped reconcilation because owner is not synced yet    {"runnerreplicaset": "actions-runner-system/repo-provisioner-runnerdeploy-m9v2d", "owner": "actions-runner-system/repo-provisioner-runnerdeploy-m9v2d-lmgwf", "pods": null}
manager 2023-12-15T14:24:50Z    DEBUG    runnerreplicaset    Skipped reconcilation because owner is not synced yet    {"runnerreplicaset": "actions-runner-system/repo-provisioner-runnerdeploy-m9v2d", "owner": "actions-runner-system/repo-provisioner-runnerdeploy-m9v2d-lmgwf", "pods": null}
manager 2023-12-15T14:24:50Z    DEBUG    controller-runtime.webhook.webhooks    received request    {"webhook": "/mutate-actions-summerwind-dev-v1alpha1-runner", "UID": "4b5759c5-7465-4e73-9ab6-6486b2e9a09c", "kind": "actions.summerwind.dev/v1alpha1, Kind=Runner", "resource": {"group":"actions.summerwind.dev","version":"v1alpha1","resource":"runners"}}
manager 2023-12-15T14:24:50Z    DEBUG    controller-runtime.webhook.webhooks    wrote response    {"webhook": "/mutate-actions-summerwind-dev-v1alpha1-runner", "code": 200, "reason": "", "UID": "4b5759c5-7465-4e73-9ab6-6486b2e9a09c", "allowed": true}
manager 2023-12-15T14:24:50Z    DEBUG    controller-runtime.webhook.webhooks    received request    {"webhook": "/validate-actions-summerwind-dev-v1alpha1-runner", "UID": "0aebc261-1922-491d-a79a-1cfe9658ca50", "kind": "actions.summerwind.dev/v1alpha1, Kind=Runner", "resource": {"group":"actions.summerwind.dev","version":"v1alpha1","resource":"runners"}}
manager 2023-12-15T14:24:50Z    INFO    runner-resource    validate resource to be updated    {"name": "repo-provisioner-runnerdeploy-m9v2d-g4j94"}
manager 2023-12-15T14:24:50Z    DEBUG    controller-runtime.webhook.webhooks    wrote response    {"webhook": "/validate-actions-summerwind-dev-v1alpha1-runner", "code": 200, "reason": "", "UID": "0aebc261-1922-491d-a79a-1cfe9658ca50", "allowed": true}
manager 2023-12-15T14:24:50Z    DEBUG    runnerreplicaset    Skipped reconcilation because owner is not synced yet    {"runnerreplicaset": "actions-runner-system/repo-provisioner-runnerdeploy-m9v2d", "owner": "actions-runner-system/repo-provisioner-runnerdeploy-m9v2d-lmgwf", "pods": null}
manager 2023-12-15T14:24:50Z    INFO    runner    Updated registration token    {"runner": "repo-provisioner-runnerdeploy-m9v2d-lmgwf", "repository": "org/repo-provisioner"}
manager 2023-12-15T14:24:50Z    INFO    runnerpod    Runner pod has been stopped with a successful status.    {"runnerpod": "actions-runner-system/repo-provisioner-runnerdeploy-m9v2d-7xfgj"}
manager 2023-12-15T14:24:50Z    DEBUG    runnerreplicaset    Skipped reconcilation because owner is not synced yet    {"runnerreplicaset": "actions-runner-system/repo-provisioner-runnerdeploy-m9v2d", "owner": "actions-runner-system/repo-provisioner-runnerdeploy-m9v2d-lmgwf", "pods": null}
manager 2023-12-15T14:24:50Z    DEBUG    events    Successfully update registration token    {"type": "Normal", "object": {"kind":"Runner","namespace":"actions-runner-system","name":"repo-provisioner-runnerdeploy-m9v2d-lmgwf","uid":"352bedde-f17b-451e-affd-83fabd954a0c","apiVersion":"actions.summerwind.dev/v1alpha1","resourceVersion":"294112600"}, "reason": "RegistrationTokenUpdated"}
manager 2023-12-15T14:24:50Z    DEBUG    runnerreplicaset    Skipped reconcilation because owner is not synced yet    {"runnerreplicaset": "actions-runner-system/repo-provisioner-runnerdeploy-m9v2d", "owner": "actions-runner-system/repo-provisioner-runnerdeploy-m9v2d-lmgwf", "pods": null}
manager 2023-12-15T14:24:50Z    INFO    runner    Updated registration token    {"runner": "repo-provisioner-runnerdeploy-m9v2d-g4j94", "repository": "org/repo-provisioner"}
manager 2023-12-15T14:24:50Z    DEBUG    events    Successfully update registration token    {"type": "Normal", "object": {"kind":"Runner","namespace":"actions-runner-system","name":"repo-provisioner-runnerdeploy-m9v2d-g4j94","uid":"f4cdf000-243d-436a-b978-3dfa15751787","apiVersion":"actions.summerwind.dev/v1alpha1","resourceVersion":"294112603"}, "reason": "RegistrationTokenUpdated"}
manager 2023-12-15T14:24:50Z    DEBUG    controller-runtime.webhook.webhooks    received request    {"webhook": "/mutate-runner-set-pod", "UID": "36fb173f-3faf-4ab1-af9f-1295c0aac5a1", "kind": "/v1, Kind=Pod", "resource": {"group":"","version":"v1","resource":"pods"}}
manager 2023-12-15T14:24:50Z    DEBUG    controller-runtime.webhook.webhooks    wrote response    {"webhook": "/mutate-runner-set-pod", "code": 200, "reason": "", "UID": "36fb173f-3faf-4ab1-af9f-1295c0aac5a1", "allowed": true}
manager 2023-12-15T14:24:50Z    INFO    runner    Created runner pod    {"runner": "actions-runner-system/repo-provisioner-runnerdeploy-m9v2d-lmgwf", "repository": "org/repo-provisioner"}
manager 2023-12-15T14:24:50Z    DEBUG    events    Created pod 'repo-provisioner-runnerdeploy-m9v2d-lmgwf'    {"type": "Normal", "object": {"kind":"Runner","namespace":"actions-runner-system","name":"repo-provisioner-runnerdeploy-m9v2d-lmgwf","uid":"352bedde-f17b-451e-affd-83fabd954a0c","apiVersion":"actions.summerwind.dev/v1alpha1","resourceVersion":"294112604"}, "reason": "PodCreated"}
manager 2023-12-15T14:24:50Z    DEBUG    controller-runtime.webhook.webhooks    received request    {"webhook": "/mutate-runner-set-pod", "UID": "896f3381-b9e7-41dd-90dd-54eb3c7ed2dc", "kind": "/v1, Kind=Pod", "resource": {"group":"","version":"v1","resource":"pods"}}
manager 2023-12-15T14:24:50Z    DEBUG    controller-runtime.webhook.webhooks    wrote response    {"webhook": "/mutate-runner-set-pod", "code": 200, "reason": "", "UID": "896f3381-b9e7-41dd-90dd-54eb3c7ed2dc", "allowed": true}
manager 2023-12-15T14:24:50Z    INFO    runner    Created runner pod    {"runner": "actions-runner-system/repo-provisioner-runnerdeploy-m9v2d-g4j94", "repository": "org/repo-provisioner"}
manager 2023-12-15T14:24:50Z    INFO    runnerpod    Runner pod has been stopped with a successful status.    {"runnerpod": "actions-runner-system/repo-provisioner-runnerdeploy-m9v2d-jhlp9"}
manager 2023-12-15T14:24:50Z    DEBUG    events    Created pod 'repo-provisioner-runnerdeploy-m9v2d-g4j94'    {"type": "Normal", "object": {"kind":"Runner","namespace":"actions-runner-system","name":"repo-provisioner-runnerdeploy-m9v2d-g4j94","uid":"f4cdf000-243d-436a-b978-3dfa15751787","apiVersion":"actions.summerwind.dev/v1alpha1","resourceVersion":"294112606"}, "reason": "PodCreated"}
manager 2023-12-15T14:24:50Z    DEBUG    runnerreplicaset    Skipped reconcilation because owner is not synced yet    {"runnerreplicaset": "actions-runner-system/repo-provisioner-runnerdeploy-m9v2d", "owner": "actions-runner-system/repo-provisioner-runnerdeploy-m9v2d-g4j94", "pods": [{"kind":"Pod","apiVersion":"v1","metadata":{"name":"repo-provisioner-runnerdeploy-m9v2d-g4j94","namespace":"actions-runner-system","uid":"c90f53fa-9df9-4b2d-aac4-ed50ad13c6be","resourceVersion":"294112615","creationTimestamp":"2023-12-15T14:24:50Z","labels":{"actions-runner":"","actions-runner-controller/inject-registration-token":"true","pod-template-hash":"765978d77b","runner-deployment-name":"repo-provisioner-runnerdeploy","runner-template-hash":"5646cf87b7"},"annotations":{"actions-runner-controller/token-expires-at":"2023-12-15T16:21:57+01:00","sync-time":"2023-12-15T14:24:50Z"},"ownerReferences":[{"apiVersion":"actions.summerwind.dev/v1alpha1","kind":"Runner","name":"repo-provisioner-runnerdeploy-m9v2d-g4j94","uid":"f4cdf000-243d-436a-b978-3dfa15751787","controller":true,"blockOwnerDeletion":true}],"managedFields":[{"manager":"manager","operation":"Update","apiVersion":"v1","time":"2023-12-15T14:24:50Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:sync-time":{}},"f:labels":{".":{},"f:actions-runner":{},"f:actions-runner-controller/inject-registration-token":{},"f:pod-template-hash":{},"f:runner-deployment-name":{},"f:runner-template-hash":{}},"f:ownerReferences":{".":{},"k:{\"uid\":\"f4cdf000-243d-436a-b978-3dfa15751787\"}":{}}},"f:spec":{"f:containers":{"k:{\"name\":\"docker\"}":{".":{},"f:args":{},"f:env":{".":{},"k:{\"name\":\"DOCKER_GROUP_GID\"}":{".":{},"f:name":{},"f:value":{}}},"f:image":{},"f:imagePullPolicy":{},"f:lifecycle":{".":{},"f:preStop":{".":{},"f:exec":{".":{},"f:command":{}}}},"f:name":{},"f:resources":{},"f:securityContext":{".":{},"f:privileged":{}},"f:terminationMessagePath":{},"f:terminationMessagePolicy":{},"f:volumeMounts":{".":{},"k:{\"mountPath\":\"/run\"}":{".":{},"f:mountPath":{},"f:name":{}},"k:{\"mountPath\":\"/runner\"}":{".":{},"f:mountPath":{},"f:name":{}},"k:{\"mountPath\":\"/runner/_work\"}":{".":{},"f:mountPath":{},"f:name":{}}}},"k:{\"name\":\"runner\"}":{".":{},"f:env":{".":{},"k:{\"name\":\"DOCKERD_IN_RUNNER\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"DOCKER_ENABLED\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"DOCKER_HOST\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"GITHUB_ACTIONS_RUNNER_EXTRA_USER_AGENT\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"GITHUB_URL\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"RUNNER_ENTERPRISE\"}":{".":{},"f:name":{}},"k:{\"name\":\"RUNNER_EPHEMERAL\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"RUNNER_GROUP\"}":{".":{},"f:name":{}},"k:{\"name\":\"RUNNER_LABELS\"}":{".":{},"f:name":{}},"k:{\"name\":\"RUNNER_NAME\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"RUNNER_ORG\"}":{".":{},"f:name":{}},"k:{\"name\":\"RUNNER_REPO\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"RUNNER_STATUS_UPDATE_HOOK\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"RUNNER_TOKEN\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"RUNNER_WORKDIR\"}":{".":{},"f:name":{},"f:value":{}}},"f:image":{},"f:imagePullPolicy":{},"f:name":{},"f:resources":{},"f:securityContext":{},"f:terminationMessagePath":{},"f:terminationMessagePolicy":{},"f:volumeMounts":{".":{},"k:{\"mountPath\":\"/run\"}":{".":{},"f:mountPath":{},"f:name":{}},"k:{\"mountPath\":\"/runner\"}":{".":{},"f:mountPath":{},"f:name":{}},"k:{\"mountPath\":\"/runner/_work\"}":{".":{},"f:mountPath":{},"f:name":{}}}}},"f:dnsPolicy":{},"f:enableServiceLinks":{},"f:restartPolicy":{},"f:schedulerName":{},"f:securityContext":{},"f:terminationGracePeriodSeconds":{},"f:volumes":{".":{},"k:{\"name\":\"runner\"}":{".":{},"f:emptyDir":{},"f:name":{}},"k:{\"name\":\"var-run\"}":{".":{},"f:emptyDir":{".":{},"f:medium":{},"f:sizeLimit":{}},"f:name":{}},"k:{\"name\":\"work\"}":{".":{},"f:emptyDir":{},"f:name":{}}}}}}]},"spec":{"volumes":[{"name":"runner","emptyDir":{}},{"name":"work","emptyDir":{}},{"name":"var-run","emptyDir":{"medium":"Memory","sizeLimit":"1M"}},{"name":"kube-api-access-fzg9v","projected":{"sources":[{"serviceAccountToken":{"expirationSeconds":3607,"path":"token"}},{"configMap":{"name":"kube-root-ca.crt","items":[{"key":"ca.crt","path":"ca.crt"}]}},{"downwardAPI":{"items":[{"path":"namespace","fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}]}}],"defaultMode":420}}],"containers":[{"name":"runner","image":"summerwind/actions-runner:latest","env":[{"name":"RUNNER_ORG"},{"name":"RUNNER_REPO","value":"org/repo-provisioner"},{"name":"RUNNER_ENTERPRISE"},{"name":"RUNNER_LABELS"},{"name":"RUNNER_GROUP"},{"name":"DOCKER_ENABLED","value":"true"},{"name":"DOCKERD_IN_RUNNER","value":"false"},{"name":"GITHUB_URL","value":"https://github.com/"},{"name":"RUNNER_WORKDIR","value":"/runner/_work"},{"name":"RUNNER_EPHEMERAL","value":"true"},{"name":"RUNNER_STATUS_UPDATE_HOOK","value":"false"},{"name":"GITHUB_ACTIONS_RUNNER_EXTRA_USER_AGENT","value":"actions-runner-controller/v0.27.6"},{"name":"DOCKER_HOST","value":"unix:///run/docker.sock"},{"name":"RUNNER_NAME","value":"repo-provisioner-runnerdeploy-m9v2d-g4j94"},{"name":"RUNNER_TOKEN","value":"A4REXNEC4CXND7YL7XSNSXLFPRXRK"}],"resources":{},"volumeMounts":[{"name":"runner","mountPath":"/runner"},{"name":"work","mountPath":"/runner/_work"},{"name":"var-run","mountPath":"/run"},{"name":"kube-api-access-fzg9v","readOnly":true,"mountPath":"/var/run/secrets/kubernetes.io/serviceaccount"}],"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","imagePullPolicy":"Always","securityContext":{}},{"name":"docker","image":"docker:dind","args":["dockerd","--host=unix:///run/docker.sock","--group=$(DOCKER_GROUP_GID)"],"env":[{"name":"DOCKER_GROUP_GID","value":"1001"}],"resources":{},"volumeMounts":[{"name":"runner","mountPath":"/runner"},{"name":"var-run","mountPath":"/run"},{"name":"work","mountPath":"/runner/_work"},{"name":"kube-api-access-fzg9v","readOnly":true,"mountPath":"/var/run/secrets/kubernetes.io/serviceaccount"}],"lifecycle":{"preStop":{"exec":{"command":["/bin/sh","-c","timeout \"${RUNNER_GRACEFUL_STOP_TIMEOUT:-15}\" /bin/sh -c \"echo 'Prestop hook started'; while [ -f /runner/.runner ]; do sleep 1; done; echo 'Waiting for dockerd to start'; while ! pgrep -x dockerd; do sleep 1; done; echo 'Prestop hook stopped'\" >/proc/1/fd/1 2>&1"]}}},"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","imagePullPolicy":"IfNotPresent","securityContext":{"privileged":true}}],"restartPolicy":"Never","terminationGracePeriodSeconds":30,"dnsPolicy":"ClusterFirst","serviceAccountName":"default","serviceAccount":"default","nodeName":"gke-xxx-platform-cor-xxx-platform-cor-b38cb942-1avk","securityContext":{},"schedulerName":"default-scheduler","tolerations":[{"key":"node.kubernetes.io/not-ready","operator":"Exists","effect":"NoExecute","tolerationSeconds":300},{"key":"node.kubernetes.io/unreachable","operator":"Exists","effect":"NoExecute","tolerationSeconds":300}],"priority":0,"enableServiceLinks":true,"preemptionPolicy":"PreemptLowerPriority"},"status":{"phase":"Pending","conditions":[{"type":"PodScheduled","status":"True","lastProbeTime":null,"lastTransitionTime":"2023-12-15T14:24:50Z"}],"qosClass":"BestEffort"}}]}
manager 2023-12-15T14:24:52Z    DEBUG    runner    Runner appears to have been registered and running.    {"runner": "actions-runner-system/repo-provisioner-runnerdeploy-m9v2d-g4j94", "podCreationTimestamp": "2023-12-15 14:24:50 +0000 UTC"}
manager 2023-12-15T14:24:52Z    DEBUG    runner    Runner appears to have been registered and running.    {"runner": "actions-runner-system/repo-provisioner-runnerdeploy-m9v2d-lmgwf", "podCreationTimestamp": "2023-12-15 14:24:50 +0000 UTC"}
manager 2023-12-15T14:25:00Z    DEBUG    runner    Runner appears to have been registered and running.    {"runner": "actions-runner-system/repo-provisioner-runnerdeploy-m9v2d-g4j94", "podCreationTimestamp": "2023-12-15 14:24:50 +0000 UTC"}
manager 2023-12-15T14:25:00Z    DEBUG    runner    Runner appears to have been registered and running.    {"runner": "actions-runner-system/repo-provisioner-runnerdeploy-m9v2d-lmgwf", "podCreationTimestamp": "2023-12-15 14:24:50 +0000 UTC"}


### Whole Runner Pod Logs

```shell
runner 2023-12-15 14:27:01.747  NOTICE --- Runner init started with pid 7
runner 2023-12-15 14:27:01.754  DEBUG --- Github endpoint URL https://github.com/
runner 2023-12-15 14:27:03.905  DEBUG --- Passing --ephemeral to config.sh to enable the ephemeral runner.
runner 2023-12-15 14:27:03.911  DEBUG --- Configuring the runner.
runner 
runner --------------------------------------------------------------------------------
runner |        ____ _ _   _   _       _          _        _   _                      |
runner |       / ___(_) |_| | | |_   _| |__      / \   ___| |_(_) ___  _ __  ___      |
runner |      | |  _| | __| |_| | | | | '_ \    / _ \ / __| __| |/ _ \| '_ \/ __|     |
runner |      | |_| | | |_|  _  | |_| | |_) |  / ___ \ (__| |_| | (_) | | | \__ \     |
runner |       \____|_|\__|_| |_|\__,_|_.__/  /_/   \_\___|\__|_|\___/|_| |_|___/     |
runner |                                                                              |
runner |                       Self-hosted runner registration                        |
runner |                                                                              |
runner --------------------------------------------------------------------------------
runner 
runner # Authentication
runner 
docker time="2023-12-15T14:27:02.081724238Z" level=info msg="Starting up"
docker time="2023-12-15T14:27:02.083391769Z" level=info msg="containerd not running, starting managed containerd"
docker time="2023-12-15T14:27:02.084540732Z" level=info msg="started new containerd process" address=/var/run/docker/containerd/containerd.sock module=libcontainerd pid=235
docker time="2023-12-15T14:27:02.120060139Z" level=info msg="starting containerd" revision=091922f03c2762540fd057fba91260237ff86acb version=v1.7.6
docker time="2023-12-15T14:27:02.143777591Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.aufs\"..." type=io.containerd.snapshotter.v1
docker time="2023-12-15T14:27:02.151873971Z" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.aufs\"..." error="aufs is not supported (modprobe aufs failed: exit status 1 \"ip: can't find device 'aufs'\\nmodprobe: can't change directory to '/lib/modules': No such file or directory\\n\"): skip plugin" type=io.containerd.snapshotter.v1
docker time="2023-12-15T14:27:02.151989995Z" level=info msg="loading plugin \"io.containerd.content.v1.content\"..." type=io.containerd.content.v1
docker time="2023-12-15T14:27:02.175868736Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.blockfile\"..." type=io.containerd.snapshotter.v1
docker time="2023-12-15T14:27:02.176138830Z" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.blockfile\"..." error="no scratch file generator: skip plugin" type=io.containerd.snapshotter.v1
docker time="2023-12-15T14:27:02.176237639Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.native\"..." type=io.containerd.snapshotter.v1
docker time="2023-12-15T14:27:02.176403962Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.overlayfs\"..." type=io.containerd.snapshotter.v1
docker time="2023-12-15T14:27:02.176914548Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.devmapper\"..." type=io.containerd.snapshotter.v1
docker time="2023-12-15T14:27:02.176954597Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.devmapper" error="devmapper not configured"
docker time="2023-12-15T14:27:02.176976217Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.zfs\"..." type=io.containerd.snapshotter.v1
docker time="2023-12-15T14:27:02.177256697Z" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.zfs\"..." error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
docker time="2023-12-15T14:27:02.177287941Z" level=info msg="loading plugin \"io.containerd.metadata.v1.bolt\"..." type=io.containerd.metadata.v1
docker time="2023-12-15T14:27:02.177396892Z" level=warning msg="could not use snapshotter devmapper in metadata plugin" error="devmapper not configured"
docker time="2023-12-15T14:27:02.177429271Z" level=info msg="metadata content store policy set" policy=shared
docker time="2023-12-15T14:27:02.251871084Z" level=info msg="loading plugin \"io.containerd.differ.v1.walking\"..." type=io.containerd.differ.v1
docker time="2023-12-15T14:27:02.251947713Z" level=info msg="loading plugin \"io.containerd.event.v1.exchange\"..." type=io.containerd.event.v1
docker time="2023-12-15T14:27:02.251973152Z" level=info msg="loading plugin \"io.containerd.gc.v1.scheduler\"..." type=io.containerd.gc.v1
docker time="2023-12-15T14:27:02.252026227Z" level=info msg="loading plugin \"io.containerd.lease.v1.manager\"..." type=io.containerd.lease.v1
docker time="2023-12-15T14:27:02.252058649Z" level=info msg="loading plugin \"io.containerd.nri.v1.nri\"..." type=io.containerd.nri.v1
docker time="2023-12-15T14:27:02.252084958Z" level=info msg="NRI interface is disabled by configuration."
docker time="2023-12-15T14:27:02.252126180Z" level=info msg="loading plugin \"io.containerd.runtime.v2.task\"..." type=io.containerd.runtime.v2
docker time="2023-12-15T14:27:02.252336194Z" level=info msg="loading plugin \"io.containerd.runtime.v2.shim\"..." type=io.containerd.runtime.v2
docker time="2023-12-15T14:27:02.252371196Z" level=info msg="loading plugin \"io.containerd.sandbox.store.v1.local\"..." type=io.containerd.sandbox.store.v1
docker time="2023-12-15T14:27:02.252393164Z" level=info msg="loading plugin \"io.containerd.sandbox.controller.v1.local\"..." type=io.containerd.sandbox.controller.v1
docker time="2023-12-15T14:27:02.252818801Z" level=info msg="loading plugin \"io.containerd.streaming.v1.manager\"..." type=io.containerd.streaming.v1
docker time="2023-12-15T14:27:02.253929131Z" level=info msg="loading plugin \"io.containerd.service.v1.introspection-service\"..." type=io.containerd.service.v1
docker time="2023-12-15T14:27:02.254014521Z" level=info msg="loading plugin \"io.containerd.service.v1.containers-service\"..." type=io.containerd.service.v1
docker time="2023-12-15T14:27:02.254052433Z" level=info msg="loading plugin \"io.containerd.service.v1.content-service\"..." type=io.containerd.service.v1
docker time="2023-12-15T14:27:02.254114517Z" level=info msg="loading plugin \"io.containerd.service.v1.diff-service\"..." type=io.containerd.service.v1
docker time="2023-12-15T14:27:02.254152509Z" level=info msg="loading plugin \"io.containerd.service.v1.images-service\"..." type=io.containerd.service.v1
docker time="2023-12-15T14:27:02.254195032Z" level=info msg="loading plugin \"io.containerd.service.v1.namespaces-service\"..." type=io.containerd.service.v1
docker time="2023-12-15T14:27:02.254226021Z" level=info msg="loading plugin \"io.containerd.service.v1.snapshots-service\"..." type=io.containerd.service.v1
docker time="2023-12-15T14:27:02.254259467Z" level=info msg="loading plugin \"io.containerd.runtime.v1.linux\"..." type=io.containerd.runtime.v1
docker time="2023-12-15T14:27:02.255321156Z" level=info msg="loading plugin \"io.containerd.monitor.v1.cgroups\"..." type=io.containerd.monitor.v1
docker time="2023-12-15T14:27:02.255812275Z" level=info msg="loading plugin \"io.containerd.service.v1.tasks-service\"..." type=io.containerd.service.v1
docker time="2023-12-15T14:27:02.255874075Z" level=info msg="loading plugin \"io.containerd.grpc.v1.introspection\"..." type=io.containerd.grpc.v1
docker time="2023-12-15T14:27:02.255894514Z" level=info msg="loading plugin \"io.containerd.transfer.v1.local\"..." type=io.containerd.transfer.v1
docker time="2023-12-15T14:27:02.255985241Z" level=info msg="loading plugin \"io.containerd.internal.v1.restart\"..." type=io.containerd.internal.v1
docker time="2023-12-15T14:27:02.256248150Z" level=info msg="loading plugin \"io.containerd.grpc.v1.containers\"..." type=io.containerd.grpc.v1
docker time="2023-12-15T14:27:02.256323656Z" level=info msg="loading plugin \"io.containerd.grpc.v1.content\"..." type=io.containerd.grpc.v1
docker time="2023-12-15T14:27:02.256422542Z" level=info msg="loading plugin \"io.containerd.grpc.v1.diff\"..." type=io.containerd.grpc.v1
docker time="2023-12-15T14:27:02.256453718Z" level=info msg="loading plugin \"io.containerd.grpc.v1.events\"..." type=io.containerd.grpc.v1
docker time="2023-12-15T14:27:02.256479212Z" level=info msg="loading plugin \"io.containerd.grpc.v1.healthcheck\"..." type=io.containerd.grpc.v1
docker time="2023-12-15T14:27:02.256503565Z" level=info msg="loading plugin \"io.containerd.grpc.v1.images\"..." type=io.containerd.grpc.v1
docker time="2023-12-15T14:27:02.256552185Z" level=info msg="loading plugin \"io.containerd.grpc.v1.leases\"..." type=io.containerd.grpc.v1
docker time="2023-12-15T14:27:02.256582930Z" level=info msg="loading plugin \"io.containerd.grpc.v1.namespaces\"..." type=io.containerd.grpc.v1
docker time="2023-12-15T14:27:02.256611979Z" level=info msg="loading plugin \"io.containerd.internal.v1.opt\"..." type=io.containerd.internal.v1
docker time="2023-12-15T14:27:02.256958395Z" level=info msg="loading plugin \"io.containerd.grpc.v1.sandbox-controllers\"..." type=io.containerd.grpc.v1
docker time="2023-12-15T14:27:02.257018424Z" level=info msg="loading plugin \"io.containerd.grpc.v1.sandboxes\"..." type=io.containerd.grpc.v1
docker time="2023-12-15T14:27:02.257046936Z" level=info msg="loading plugin \"io.containerd.grpc.v1.snapshots\"..." type=io.containerd.grpc.v1
docker time="2023-12-15T14:27:02.257079026Z" level=info msg="loading plugin \"io.containerd.grpc.v1.streaming\"..." type=io.containerd.grpc.v1
docker time="2023-12-15T14:27:02.257136031Z" level=info msg="loading plugin \"io.containerd.grpc.v1.tasks\"..." type=io.containerd.grpc.v1
docker time="2023-12-15T14:27:02.257184740Z" level=info msg="loading plugin \"io.containerd.grpc.v1.transfer\"..." type=io.containerd.grpc.v1
docker time="2023-12-15T14:27:02.257211528Z" level=info msg="loading plugin \"io.containerd.grpc.v1.version\"..." type=io.containerd.grpc.v1
docker time="2023-12-15T14:27:02.257242249Z" level=info msg="loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." type=io.containerd.tracing.processor.v1
docker time="2023-12-15T14:27:02.257270729Z" level=info msg="skip loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." error="no OpenTelemetry endpoint: skip plugin" type=io.containerd.tracing.processor.v1
docker time="2023-12-15T14:27:02.257299544Z" level=info msg="loading plugin \"io.containerd.internal.v1.tracing\"..." type=io.containerd.internal.v1
docker time="2023-12-15T14:27:02.257320819Z" level=info msg="skipping tracing processor initialization (no tracing plugin)" error="no OpenTelemetry endpoint: skip plugin"
docker time="2023-12-15T14:27:02.257786919Z" level=info msg=serving... address=/var/run/docker/containerd/containerd-debug.sock
docker time="2023-12-15T14:27:02.257872806Z" level=info msg=serving... address=/var/run/docker/containerd/containerd.sock.ttrpc
docker time="2023-12-15T14:27:02.257949226Z" level=info msg=serving... address=/var/run/docker/containerd/containerd.sock
docker time="2023-12-15T14:27:02.257980850Z" level=info msg="containerd successfully booted in 0.139331s"
runner 
runner √ Connected to GitHub
runner 
runner # Runner Registration
runner 
runner 
runner 
runner 
runner √ Runner successfully added
docker time="2023-12-15T14:27:09.178024096Z" level=info msg="Loading containers: start."
docker time="2023-12-15T14:27:09.254562479Z" level=info msg="stopping event stream following graceful shutdown" error="<nil>" module=libcontainerd namespace=moby
docker time="2023-12-15T14:27:09.255228112Z" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=plugins.moby
docker time="2023-12-15T14:27:09.255284261Z" level=info msg="stopping healthcheck following graceful shutdown" module=libcontainerd
runner √ Runner connection is good
runner 
runner # Runner settings
runner 
runner 
runner √ Settings Saved.
runner 
runner 2023-12-15 14:27:09.656  DEBUG --- Runner successfully configured.
runner {
runner   "agentId": 272,
runner   "agentName": "<<redacted>>-runnerdeploy-m9v2d-6lgb9",
runner   "poolId": 1,
runner   "poolName": "Default",
runner   "ephemeral": true,
runner   "serverUrl": "<redacted>",
runner   "gitHubUrl": "<redacted>",
runner   "workFolder": "/runner/_work"
runner 2023-12-15 14:27:09.665  DEBUG --- Docker enabled runner detected and Docker daemon wait is enabled
runner 2023-12-15 14:27:09.667  DEBUG --- Waiting until Docker is available or the timeout of 120 seconds is reached
docker failed to start daemon: Error initializing network controller: error obtaining controller instance: unable to add return rule in DOCKER-ISOLATION-STAGE-1 chain:  (iptables failed: iptables --wait -A DOCKER-ISOLATION-STAGE-1 -j RETURN: iptables v1.8.10 (nf_tables):  RULE_APPEND failed (No such file or directory): rule in chain DOCKER-ISOLATION-STAGE-1
docker  (exit status 4))
runner Cannot connect to the Docker daemon at unix:///run/docker.sock. Is the docker daemon running?
Stream closed EOF for actions-runner-system/<<redacted>>-runnerdeploy-m9v2d-6lgb9 (docker)
runner Cannot connect to the Docker daemon at unix:///run/docker.sock. Is the docker daemon running?



### Additional Context

_No response_

iamcaleberic commented 7 months ago

Updating and pinning image.dindSidecarRepositoryAndTag at docker:24.0.7-dind-alpine3.18 appears to resolve it

sbalasu27 commented 7 months ago

@iamcaleberic yes, we had the same issue and the workaround works 👍

billimek commented 7 months ago

Seeing the same issue here running in GKE. We're also dealing with a problem where this morning we ended-up with 10,000 runners (triggering secondary rate limiting) and the vast majority of them were 'offline'.

Is there any chance that there is a relationship between this and runners being left in an offline state as they fail to come online cleanly and ARC controller (v0.26.0) not properly de-registering them from GitHub?

EDIT/UPDATE: After we implemented the fix to pin the docker sidecar to docker:24.0.7-dind-alpine3.18 we no longer saw the issue with the building 'offline' runners and believe that the two are related.

joshgc commented 7 months ago

Sorry for the naive question but where are you specifying image.dindSidecarRepositoryAndTag I'm not seeing any mention of that in the actions-runner-controller.yaml. Is this perhaps a Helm thing? Surely it has a kubectl/yaml-only representation too? Thank you for the great tips.

sergiopsyalo commented 7 months ago

We are experimenting the same issue

joshgc commented 7 months ago

@verult was able to patch the command directly in like this on line 34342 in version 0.26 of actions-runner-controller.yaml

      containers:
      - args:         - args:
        - --metrics-addr=127.0.0.1:8080         - --metrics-addr=127.0.0.1:8080
        - --enable-leader-election          - --enable-leader-election
        # Temporary workaround for https://github.com/actions/actions-runner-controller/issues/3159
        - --docker-image=docker:24.0.7-dind-alpine3.18
        command:            command:
        - /manager          - /manager
        env:            env:

LaloLoop commented 7 months ago

Thanks for the suggestions, everyone. We got our runners working again, but the pods won't terminate. We use ephemeral runners, and the docker issue impacted us today. The runners couldn't start, and we reached a Runner Group 10k limit. Once the runners started again, they were not cleaned up and stayed in a Terminating phase. We're still trying to figure out why. We tested various versions of both the chart and app versions, but at least running v0.26.0 with --docker-image=docker:24.0.7-dind-alpine3.18 resulted in pods lingering. We're also using a custom runner image, which could be an issue. We'll keep on investigating.

verult commented 7 months ago

@LaloLoop we ran into the issue of pods getting stuck in the Terminating phase after we deleted the runner controller, because there were finalizers left on these pods. Is your controller running when your pods are stuck?

LaloLoop commented 7 months ago

Thanks for pointing that out @verult . We reached the rate limit as described by @billimek . That caused the controller to panic continously and fail to reconcile. We're using 0.26.0, not sur if newer versions have better error/retries handling. I guess we're gonna have to wait to have it reset before trying anything. Changing anything in our runners at the moment triggers the rate limit and everything fails, even if we don't use pulling for the auto scalers.

hariapollo commented 7 months ago

@joshgc you can find in here https://github.com/actions/actions-runner-controller/blob/master/charts/actions-runner-controller/values.yaml#L55

Thanks @iamcaleberic it did magic and worked for us as well.

sylvain-actual commented 7 months ago

Same error for me. Fixed with : dindSidecarRepositoryAndTag: "docker:24.0.7-dind-alpine3.18"

brconnell4 commented 7 months ago

For those running auto scaling runner set, I tried to update the template.spec.containers.dind to 24.0.7-dind-alpine3.18 and it didn't work. It retained the value of docker:dind. I know my syntax is correct because I also pin our custom image to containers as well.

I manually updated the CRD autoscalingrunnerset to docker:24.0.7-dind-alpine3.18 and this seems to work as well.

My question is, why is this not pinned to a stable version instead of "latest"? It exposes us to unstable updates that can lead to downtime or interruption.

billimek commented 7 months ago

If this is still an issue for some folks and you are still dealing with ~10,000 offline runners which is triggering the secondary rate-limiting, the following script snippet may be useful to remove the offline runners,

#!/bin/bash

while true; do
    echo "Fetching more runners"
    RESPONSE=$(gh api \
    -H "Accept: application/vnd.github+json" \
    -H "X-GitHub-Api-Version: 2022-11-28" \
    /orgs/<YOUR ORG>/actions/runners)

    echo "Total runners: $(echo "$RESPONSE" | jq '.total_count')"
    OFFLINE_RUNNERS="$(echo "$RESPONSE" | jq '.runners | map(select(.status == "offline"))')"

    RUNNERS="$(echo "$OFFLINE_RUNNERS" | jq '.[].id') "

    # Loop for each runner
    for RUNNER in $RUNNERS; do
    echo "Removing runner: $RUNNER"
    gh api \
        -H "Accept: application/vnd.github+json" \
        -H "X-GitHub-Api-Version: 2022-11-28" \
        -X DELETE \
        "/orgs/<YOUR ORG>/actions/runners/$RUNNER" >> removal.logs
    done
    # If there was no runners, break
    if [ -z "$RUNNERS" ]; then
        echo "Done!"
        break
    fi
done

... or the following action may accomplish the same thing as well (just don't run it on self hosted runners where you are experiencing this issue!): some-natalie/runner-reaper.

It's my understanding that GitHub should automatically remove offline runners after 24h but the symptom of this issue seems to be that it will very quickly ramp up the number of offline runners making that automation not viable unless or until you correct the pinned docker version.

It also looks like the upstream docker:dind image was corrected so your system may self correct over some time anyway.

billimek commented 7 months ago

As @iamcaleberic pointed out, if you're deploying the actions-runner-controller helm chart, the relevant values line to override when re-deploying a fix to the chart is located here

If you're running the newer gha-runner-scale-set chart and it's exhibiting the same issue (we don't currently run this one so it's unclear if the scale set is affected or not), it looks like the modification necessary is going to be related to the template spec definition here.

romanvogman commented 7 months ago

running the newer gha-runner-scale-set, and overriding the spec in the values.yaml with a new docker tag doesn't seem to make a difference, it stays on docker:dind.

anyone managed to find a workaround for it?

brconnell4 commented 7 months ago

running the newer gha-runner-scale-set, and overriding the spec in the values.yaml with a new docker tag doesn't seem to make a difference, it stays on docker:dind.

anyone managed to find a workaround for it?

Update the CRD manually under autoscalingrunnerset and patch it.

jamezrin commented 7 months ago

running the newer gha-runner-scale-set, and overriding the spec in the values.yaml with a new docker tag doesn't seem to make a difference, it stays on docker:dind.

anyone managed to find a workaround for it?

This worked for me: https://github.com/jamezrin/personal-actions-runner-setup/blob/main/gha-runner-scale-set-dind-fix.yaml#L24C53-L24C117

farrukh90 commented 7 months ago

Updating and pinning image.dindSidecarRepositoryAndTag at docker:24.0.7-dind-alpine3.18 appears to resolve it

So, how do we do that? I have the following file below and I dont know where to add it.

apiVersion: actions.summerwind.dev/v1alpha1 kind: RunnerDeployment metadata: name: example-runnerdeploy namespace: actions-runner-system annotations: cluster-autoscaler.kubernetes.io/safe-to-evict: "true" labels: name: example-runnerdeploy spec: replicas: 1 template: spec: repository: farrukh90/symmetrical-fortnight image: farrukhsadykov/runner:latest labels:

example-runnerdeploy

apiVersion: actions.summerwind.dev/v1alpha1 kind: HorizontalRunnerAutoscaler metadata: name: example-runnerdeploy namespace: actions-runner-system annotations: cluster-autoscaler.kubernetes.io/safe-to-evict: "true" labels: name: example-runnerdeploy spec: scaleTargetRef: name: example-runnerdeploy scaleDownDelaySecondsAfterScaleOut: 300 minReplicas: 2 maxReplicas: 20 metrics:
- type: TotalNumberOfQueuedAndInProgressWorkflowRuns repositoryNames:
- xxxxxx/symmetrical-fortnight

apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: example-runnerdeploy namespace: actions-runner-system spec: minAvailable: 1 selector: matchLabels: app: example-runnerdeploy

apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: for-aws-tasks parameters: type: pd-standard provisioner: kubernetes.io/gce-pd reclaimPolicy: Retain volumeBindingMode: Immediate allowVolumeExpansion: false

volatilemolotov commented 7 months ago

image.dindSidecarRepositoryAndTag

is done on helm level

kopax-polyconseil commented 6 months ago

Is this fixed now or should we stick to the binded version of the image ?

romanvogman commented 6 months ago

running the newer gha-runner-scale-set, and overriding the spec in the values.yaml with a new docker tag doesn't seem to make a difference, it stays on docker:dind. anyone managed to find a workaround for it?

This worked for me: https://github.com/jamezrin/personal-actions-runner-setup/blob/main/gha-runner-scale-set-dind-fix.yaml#L24C53-L24C117

For the gha scale set, I've ended up leaving container mode to be empty and updated template to include the specs to be the same as once created when container mode is dind, only with the new docker tag. Found it to be a better solution for me to continue using the help chart I've already had in hope there will be a fix that supports dind image tag from values.yaml

Jmainguy commented 6 months ago

Fix has been implemented upstream in docker:dind, however it now requires this helm-chart / us using to set a new variable.

https://github.com/docker-library/docker/pull/468#issuecomment-1878086606

set DOCKER_IPTABLES_LEGACY=1 inside your dind pod, via an overwrite to the helm chart default variables (this should get added to the helm chart, if someone wants an easy PR)

Change should go right after these lines for the PR to the chart if someone had a minute to open it. https://github.com/actions/actions-runner-controller/blob/master/charts/gha-runner-scale-set/templates/_helpers.tpl#L106 and https://github.com/actions/actions-runner-controller/blob/master/charts/gha-runner-scale-set/values.yaml#L142

actions / actions-runner-controller