Open viniciusesteter opened 9 months ago
Hello! Thank you for filing an issue.
The maintainers will triage your issue shortly.
In the meantime, please take a look at the troubleshooting guide for bug reports.
If this is a feature request, please review our contribution guidelines.
I think I'm facing the same issue as well. I it happened to me a month ago and it went away on its own but I couldn't figure it out.
The logs of the docker
container in the runner pods all emit this:
cat: can't open '/proc/net/arp_tables_names': No such file or directory
iptables v1.8.10 (nf_tables)
time="2024-02-02T14:36:08.349688382Z" level=info msg="Starting up"
time="2024-02-02T14:36:08.350965386Z" level=info msg="containerd not running, starting managed containerd"
time="2024-02-02T14:36:08.351685472Z" level=info msg="started new containerd process" address=/var/run/docker/containerd/containerd.sock module=libcontainerd pid=30
time="2024-02-02T14:36:08.373648316Z" level=info msg="starting containerd" revision=7c3aca7a610df76212171d200ca3811ff6096eb8 version=v1.7.13
time="2024-02-02T14:36:08.392594430Z" level=info msg="loading plugin \"io.containerd.event.v1.exchange\"..." type=io.containerd.event.v1
time="2024-02-02T14:36:08.392636213Z" level=info msg="loading plugin \"io.containerd.internal.v1.opt\"..." type=io.containerd.internal.v1
time="2024-02-02T14:36:08.392898621Z" level=info msg="loading plugin \"io.containerd.warning.v1.deprecations\"..." type=io.containerd.warning.v1
time="2024-02-02T14:36:08.392917009Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.blockfile\"..." type=io.containerd.snapshotter.v1
time="2024-02-02T14:36:08.392969909Z" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.blockfile\"..." error="no scratch file generator: skip plugin" type=io.containerd.snapshotter.v1
time="2024-02-02T14:36:08.392983451Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.devmapper\"..." type=io.containerd.snapshotter.v1
time="2024-02-02T14:36:08.392992588Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.devmapper" error="devmapper not configured"
time="2024-02-02T14:36:08.393000482Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.native\"..." type=io.containerd.snapshotter.v1
time="2024-02-02T14:36:08.393063886Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.overlayfs\"..." type=io.containerd.snapshotter.v1
time="2024-02-02T14:36:08.393264992Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.aufs\"..." type=io.containerd.snapshotter.v1
time="2024-02-02T14:36:08.397803212Z" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.aufs\"..." error="aufs is not supported (modprobe aufs failed: exit status 1 \"ip: can't find device 'aufs'\\nmodprobe: can't change directory to '/lib/modules': No such file or directory\\n\"): skip plugin" type=io.containerd.snapshotter.v1
time="2024-02-02T14:36:08.397832591Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.zfs\"..." type=io.containerd.snapshotter.v1
time="2024-02-02T14:36:08.398031842Z" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.zfs\"..." error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
time="2024-02-02T14:36:08.398046545Z" level=info msg="loading plugin \"io.containerd.content.v1.content\"..." type=io.containerd.content.v1
time="2024-02-02T14:36:08.398150598Z" level=info msg="loading plugin \"io.containerd.metadata.v1.bolt\"..." type=io.containerd.metadata.v1
time="2024-02-02T14:36:08.398212774Z" level=warning msg="could not use snapshotter devmapper in metadata plugin" error="devmapper not configured"
time="2024-02-02T14:36:08.398230252Z" level=info msg="metadata content store policy set" policy=shared
time="2024-02-02T14:36:08.445396814Z" level=info msg="loading plugin \"io.containerd.gc.v1.scheduler\"..." type=io.containerd.gc.v1
time="2024-02-02T14:36:08.445471715Z" level=info msg="loading plugin \"io.containerd.differ.v1.walking\"..." type=io.containerd.differ.v1
time="2024-02-02T14:36:08.445500464Z" level=info msg="loading plugin \"io.containerd.lease.v1.manager\"..." type=io.containerd.lease.v1
time="2024-02-02T14:36:08.445576869Z" level=info msg="loading plugin \"io.containerd.streaming.v1.manager\"..." type=io.containerd.streaming.v1
time="2024-02-02T14:36:08.445618783Z" level=info msg="loading plugin \"io.containerd.runtime.v1.linux\"..." type=io.containerd.runtime.v1
time="2024-02-02T14:36:08.445781283Z" level=info msg="loading plugin \"io.containerd.monitor.v1.cgroups\"..." type=io.containerd.monitor.v1
time="2024-02-02T14:36:08.446305234Z" level=info msg="loading plugin \"io.containerd.runtime.v2.task\"..." type=io.containerd.runtime.v2
time="2024-02-02T14:36:08.446589823Z" level=info msg="loading plugin \"io.containerd.runtime.v2.shim\"..." type=io.containerd.runtime.v2
time="2024-02-02T14:36:08.446619509Z" level=info msg="loading plugin \"io.containerd.sandbox.store.v1.local\"..." type=io.containerd.sandbox.store.v1
time="2024-02-02T14:36:08.446666322Z" level=info msg="loading plugin \"io.containerd.sandbox.controller.v1.local\"..." type=io.containerd.sandbox.controller.v1
time="2024-02-02T14:36:08.446705283Z" level=info msg="loading plugin \"io.containerd.service.v1.containers-service\"..." type=io.containerd.service.v1
time="2024-02-02T14:36:08.446759587Z" level=info msg="loading plugin \"io.containerd.service.v1.content-service\"..." type=io.containerd.service.v1
time="2024-02-02T14:36:08.446780137Z" level=info msg="loading plugin \"io.containerd.service.v1.diff-service\"..." type=io.containerd.service.v1
time="2024-02-02T14:36:08.446806016Z" level=info msg="loading plugin \"io.containerd.service.v1.images-service\"..." type=io.containerd.service.v1
time="2024-02-02T14:36:08.446835246Z" level=info msg="loading plugin \"io.containerd.service.v1.introspection-service\"..." type=io.containerd.service.v1
time="2024-02-02T14:36:08.446858358Z" level=info msg="loading plugin \"io.containerd.service.v1.namespaces-service\"..." type=io.containerd.service.v1
time="2024-02-02T14:36:08.446883787Z" level=info msg="loading plugin \"io.containerd.service.v1.snapshots-service\"..." type=io.containerd.service.v1
time="2024-02-02T14:36:08.446902822Z" level=info msg="loading plugin \"io.containerd.service.v1.tasks-service\"..." type=io.containerd.service.v1
time="2024-02-02T14:36:08.446932581Z" level=info msg="loading plugin \"io.containerd.grpc.v1.containers\"..." type=io.containerd.grpc.v1
time="2024-02-02T14:36:08.446961217Z" level=info msg="loading plugin \"io.containerd.grpc.v1.content\"..." type=io.containerd.grpc.v1
time="2024-02-02T14:36:08.446981273Z" level=info msg="loading plugin \"io.containerd.grpc.v1.diff\"..." type=io.containerd.grpc.v1
time="2024-02-02T14:36:08.446997347Z" level=info msg="loading plugin \"io.containerd.grpc.v1.events\"..." type=io.containerd.grpc.v1
time="2024-02-02T14:36:08.447016883Z" level=info msg="loading plugin \"io.containerd.grpc.v1.images\"..." type=io.containerd.grpc.v1
time="2024-02-02T14:36:08.447036957Z" level=info msg="loading plugin \"io.containerd.grpc.v1.introspection\"..." type=io.containerd.grpc.v1
time="2024-02-02T14:36:08.447052236Z" level=info msg="loading plugin \"io.containerd.grpc.v1.leases\"..." type=io.containerd.grpc.v1
time="2024-02-02T14:36:08.447070724Z" level=info msg="loading plugin \"io.containerd.grpc.v1.namespaces\"..." type=io.containerd.grpc.v1
time="2024-02-02T14:36:08.447087998Z" level=info msg="loading plugin \"io.containerd.grpc.v1.sandbox-controllers\"..." type=io.containerd.grpc.v1
time="2024-02-02T14:36:08.447107438Z" level=info msg="loading plugin \"io.containerd.grpc.v1.sandboxes\"..." type=io.containerd.grpc.v1
time="2024-02-02T14:36:08.447123714Z" level=info msg="loading plugin \"io.containerd.grpc.v1.snapshots\"..." type=io.containerd.grpc.v1
time="2024-02-02T14:36:08.447148047Z" level=info msg="loading plugin \"io.containerd.grpc.v1.streaming\"..." type=io.containerd.grpc.v1
time="2024-02-02T14:36:08.447166979Z" level=info msg="loading plugin \"io.containerd.grpc.v1.tasks\"..." type=io.containerd.grpc.v1
time="2024-02-02T14:36:08.447198670Z" level=info msg="loading plugin \"io.containerd.transfer.v1.local\"..." type=io.containerd.transfer.v1
time="2024-02-02T14:36:08.447442412Z" level=info msg="loading plugin \"io.containerd.grpc.v1.transfer\"..." type=io.containerd.grpc.v1
time="2024-02-02T14:36:08.447530474Z" level=info msg="loading plugin \"io.containerd.grpc.v1.version\"..." type=io.containerd.grpc.v1
time="2024-02-02T14:36:08.447564804Z" level=info msg="loading plugin \"io.containerd.internal.v1.restart\"..." type=io.containerd.internal.v1
time="2024-02-02T14:36:08.447645525Z" level=info msg="loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." type=io.containerd.tracing.processor.v1
time="2024-02-02T14:36:08.447680777Z" level=info msg="skip loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." error="no OpenTelemetry endpoint: skip plugin" type=io.containerd.tracing.processor.v1
time="2024-02-02T14:36:08.447698141Z" level=info msg="loading plugin \"io.containerd.internal.v1.tracing\"..." type=io.containerd.internal.v1
time="2024-02-02T14:36:08.447725487Z" level=info msg="skipping tracing processor initialization (no tracing plugin)" error="no OpenTelemetry endpoint: skip plugin"
time="2024-02-02T14:36:08.447855630Z" level=info msg="loading plugin \"io.containerd.grpc.v1.healthcheck\"..." type=io.containerd.grpc.v1
time="2024-02-02T14:36:08.447879733Z" level=info msg="loading plugin \"io.containerd.nri.v1.nri\"..." type=io.containerd.nri.v1
time="2024-02-02T14:36:08.447899819Z" level=info msg="NRI interface is disabled by configuration."
time="2024-02-02T14:36:08.448219892Z" level=info msg=serving... address=/var/run/docker/containerd/containerd-debug.sock
time="2024-02-02T14:36:08.448288542Z" level=info msg=serving... address=/var/run/docker/containerd/containerd.sock.ttrpc
time="2024-02-02T14:36:08.448345253Z" level=info msg=serving... address=/var/run/docker/containerd/containerd.sock
time="2024-02-02T14:36:08.448380520Z" level=info msg="containerd successfully booted in 0.075643s"
time="2024-02-02T14:36:11.271697198Z" level=info msg="Loading containers: start."
time="2024-02-02T14:36:11.366705804Z" level=info msg="stopping event stream following graceful shutdown" error="<nil>" module=libcontainerd namespace=moby
time="2024-02-02T14:36:11.367239329Z" level=info msg="stopping healthcheck following graceful shutdown" module=libcontainerd
time="2024-02-02T14:36:11.367283663Z" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=plugins.moby
failed to start daemon: Error initializing network controller: error creating default "bridge" network: Failed to Setup IP tables: Unable to enable NAT rule: (iptables failed: iptables --wait -t nat -I POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE: Warning: Extension MASQUERADE revision 0 not supported, missing kernel module?
iptables v1.8.10 (nf_tables): CHAIN_ADD failed (No such file or directory): chain POSTROUTING
(exit status 4))
I'm using GKE version 1.28, with the default dind container image in the helm chart.
Wonder if this has anything to do with the recent fix GKE has released for CVE-2023-6817
Wonder if this has anything to do with the recent fix GKE has released for CVE-2023-6817
I don’t think so. Because this errors in my cluster happening since 4/5 months ago.
Ended up following this workaround which made dind work again: https://github.com/actions/actions-runner-controller/issues/3159#issuecomment-1906905610
Still think dind needs to address this.
But where manifest in helm, Can I put this arguments? Because I don't have a argument that has container docker. And I actually used image: summerwind/actions-runner:latest
in my Dockerfile
and summerwind/actions-runner:latest
in my values.yaml from deployment.yaml
helm.
I agree, that's tricky. I actually transitioned to the new runner-scale-set operator, where you can control the pod template, including the dind sidecar container.
I agree, that's tricky. I actually transitioned to the new runner-scale-set operator, where you can control the pod template, including the dind sidecar container.
@asafhm Would you be willing to share the snippet of your values.yaml
(or helm command) where you specified the dind
container with the workaround environment variable?
@jctrouble Here's a portion of the values.yaml
I use for the gha-runner-scale-set
chart:
template:
spec:
initContainers:
- name: init-dind-externals
image: ghcr.io/actions/actions-runner:latest
command:
["cp", "-r", "-v", "/home/runner/externals/.", "/home/runner/tmpDir/"]
volumeMounts:
- name: dind-externals
mountPath: /home/runner/tmpDir
containers:
- name: runner
image: ghcr.io/actions/actions-runner:latest
imagePullPolicy: Always
command: ["/home/runner/run.sh"]
resources:
limits:
cpu: 400m
memory: 512Mi
requests:
cpu: 200m
memory: 256Mi
env:
- name: DOCKER_HOST
value: unix:///run/docker/docker.sock
volumeMounts:
- name: work
mountPath: /home/runner/_work
- name: dind-sock
mountPath: /run/docker
readOnly: true
- name: dind
image: docker:dind
args:
- dockerd
- --host=unix:///run/docker/docker.sock
- --group=$(DOCKER_GROUP_GID)
env:
- name: DOCKER_GROUP_GID
value: "123"
# TODO: Once this issue is fixed (https://github.com/actions/actions-runner-controller/issues/3159),
# we can switch to containerMode.type=dind and keep only the "runner" container specs and remove the "dind" container, init containers and volumes parts from the values.
- name: DOCKER_IPTABLES_LEGACY
value: "1"
securityContext:
privileged: true
volumeMounts:
- name: work
mountPath: /home/runner/_work
- name: dind-sock
mountPath: /run/docker
- name: dind-externals
mountPath: /home/runner/externals
volumes:
- name: work
emptyDir: {}
- name: dind-sock
emptyDir: {}
- name: dind-externals
emptyDir: {}
The reason I added a lot more here than just the env var part is because the docs specify that if you need to modify something in the dind container, you have to have all its configuration in your values file and edit it there. Not a clean solution yet I'm afraid, but at least it works well.
Hi @asafhm I have tried your workaround but still facing the same issue. the issue started after upgrading new scale-set to its latest version. any other options to try ? Thanks!
The runner is starting fine, but the error appears if I run a workflow which has docker build step., so I am a bit clueless!
@rekha-prakash-maersk Did you verify that runner pods that come up have said env var in the dind container spec?
Also did you check the dind container logs? Cannot connect to the Docker daemon at unix:///run/docker.sock
can result from a number of reasons.
Hi @asafhm , I found that dind container needed more resources for the docker build that was executed. thanks for the help!
we are facing similar issue -
time="2024-04-11T22:08:59.214409763Z" level=info msg="Loading containers: start." │
│ time="2024-04-11T22:08:59.337082693Z" level=info msg="stopping event stream following graceful shutdown" error="
any suggestion @rekha-prakash-maersk @asafhm
I'm have the same issue on Google Cloud Platform on GKE when simply using:
containerMode:
type: "dind"
I haven't adjusted any of the values.
Hi @marc-barry , I have allocated more resource to CPU and memory for dind container like below, which resolved the issue for me
- name: dind
image: docker:dind
args:
- dockerd
- --host=unix:///run/docker/docker.sock
- --group=$(DOCKER_GROUP_GID)
env:
- name: DOCKER_GROUP_GID
value: "123"
resources:
requests:
memory: "500Mi"
cpu: "300m"
limits:
memory: "500Mi"
cpu: "300m"
securityContext:
privileged: true
@rekha-prakash-maersk thanks for that information. We've decided to move away from using runners on Kubernetes as the documentation isn't yet fully complete and we don't want to spend our time fighting infrastructure problems like we are experiencing with this controller. The concepts and ideas are pretty sound but the execution is challenging. For the time being, we have gone to bare VMs running Debian on GCP on both t2a-standard-x
for our Arm64 builds and t2d-standard-x
for our Amd64 builds. We then have an image template that simply has Docker installed on the machine and the runner started with Systemd. I was able to get this all running in under an hour versus the challenges faced with the Actions Runner Controller.
GitHub Actions is super convenient and that's why we use it. But if I find the need to bring our runners more and more then I'll switch us to Buildkite as I feel like their BYOC is a bit more developed (and I have a lot of experience with it).
@rekha-prakash-maersk do we need to comment below sections ? containerMode: type: "dind"
I am seeing this, too, intermittently, running on AWS EKS, Kubernetes v1.29.3.
/usr/bin/docker build ...
ERROR: Cannot connect to the Docker daemon at unix:///run/docker/docker.sock. Is the docker daemon running?
I am seeing this, too, intermittently, running on AWS EKS, Kubernetes v1.29.3.
/usr/bin/docker build ... ERROR: Cannot connect to the Docker daemon at unix:///run/docker/docker.sock. Is the docker daemon running?
Same here. It's a very small percentage of jobs but I have yet to figure out why.
Checks
Controller Version
latest
Helm Chart Version
0.27.6
CertManager Version
1.13.1
Deployment Method
Helm
cert-manager installation
Installed ok by Chart.yaml
Checks
Resource Definitions
To Reproduce
Describe the bug
I'm using a GKE version: 1.26.10-gke.1101000. In my Dockerfile, I'm using: FROM summerwind/actions-runner:latest.
In values.yaml, I'm using:
But when deploy is done, in GKE and get a lot of pods, with error: "Cannot connect to the Docker daemon at unix:///run/docker.sock. Is the docker daemon running?"
The pods are restarting with error in container "docker" with this message: "Cannot connect to the Docker daemon at unix:///run/docker.sock. Is the docker daemon running?". It died and start new with the same problem.
I've already follow this issue: 2490, but doesn't work.
Could help me please?
Describe the expected behavior
Doesn't get this situation with error, and running normally.
Whole Controller Logs
Whole Runner Pod Logs
Additional Context
No response