Open duchuyvp opened 3 months ago
Hello! Thank you for filing an issue.
The maintainers will triage your issue shortly.
In the meantime, please take a look at the troubleshooting guide for bug reports.
If this is a feature request, please review our contribution guidelines.
@duchuyvp , do you happen to run the deployment on GKE?
@norman-zon I haven't test on GKE, I deployed on-prems
Try setting MTU for the docker daemon like:
- name: dind
image: docker:dind
args:
- dockerd
- --host=unix:///var/run/docker.sock
- --group=$(DOCKER_GROUP_GID)
- --mtu=1460
The default docker daemon MTU is 1500, but my host network has 1460. So aligning the docker daemon MTU fixed it for me.
@norman-zon Thank you so much, your idea works for me too, I tried to patch one runner pod to add --mtu=1450
to dind container. But I don't know how to add this args when deploy with helm, since dind-container
seems to be fixed in gha-runner-scale-set
chart
https://github.com/actions/actions-runner-controller/blob/a152741a1a6afa992f8d836a029d551984149c8f/charts/gha-runner-scale-set/templates/_helpers.tpl#L98-L116
Could you please show me how?
I ended up using the solution with a configMap as described in the discussion here.
You have to set
containerMode:
type: none
and then completely specify the template for the container, as described in the values file.
This could be be easier to add to the dind container, if my PR would be merged...
Unfortunately this didn't solve our issue, which is ostensibly the same.
We have self-hosted runners in an on-premises OpenStack K8s cluster. For container actions which specify our own helper image with some useful utilities installed we can not connect to Github to clone the relevant repository. We have tried with both checkout actions, the GitHub cli and standard git with auth setup in the job.
After seeing this post we modified the DinD container as suggested passing the mtu argument and verified that this was indeed being set. And as a test followed the GP's example, trying to clone from the Runner container after installing git
, which succeeded, then from the spawned helper container we tried to clone via the already installed git, which failed. All the different tests we have conducted resulted in variations of the same theme - ssl/tls timeout errors:
kubectl exec -it github-runner-scale-set-hello-world-cbr74-runner-jdr2z -- sh
Defaulted container "runner" out of: runner, dind, init-dind-externals (init)
$ sudo apt install git -y
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
<snipped>
Setting up git (1:2.46.0-0ppa1~ubuntu22.04.1) ...
Processing triggers for libc-bin (2.35-0ubuntu3.8) ...
$ git clone https://github.com/actions/actions-runner-controller.git <-- we can clone in runner container after installing git
Cloning into 'actions-runner-controller'...
remote: Enumerating objects: 12348, done.
remote: Counting objects: 100% (27/27), done.
remote: Compressing objects: 100% (26/26), done.
remote: Total 12348 (delta 11), reused 8 (delta 1), pack-reused 12321 (from 1)
Receiving objects: 100% (12348/12348), 5.44 MiB | 33.33 MiB/s, done.
Resolving deltas: 100% (8430/8430), done.
$ ls -ltr actions-runner-controller
drwxr-xr-x 23 runner runner 4096 Aug 14 06:42 actions-runner-controller
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
cd3c11559488 ghcr.io/***/pipeline-helper:0.0.4 "tail -f /dev/null" About a minute ago Up About a minute e588e3cf54e848bd99acc500aeec932e_ghcrio***pipelinehelper004_3c7f01
$ docker exec -it cd3c11559488 sh
/ # git --version <-- git already installed in container job
git version 2.45.2
/ # git clone https://github.com/actions/actions-runner-controller.git
Cloning into 'actions-runner-controller'...
fatal: unable to access 'https://github.com/actions/actions-runner-controller.git/': SSL connection timeout
Error: Process completed with exit code 128.
The specific error when using the GitHub Cli was error validating token: Get "https://api.github.com/": net/http: TLS handshake timeout
@nikola-jokic HI. i am not sure why in the original Helm there is not way to change the DinD config as its looked in the helm _helpers.tpl
{{ - define "gha-runner-scale-set.dind-container" -}}
image: docker:dind
args:
- dockerd
- --host=unix:///var/run/docker.sock
- --group=$(DOCKER_GROUP_GID)
env:
- name: DOCKER_GROUP_GID
value: "123"
securityContext:
privileged: true
volumeMounts:
- name: work
mountPath: /home/runner/_work
- name: dind-sock
mountPath: /var/run
- name: dind-externals
mountPath: /home/runner/externals
{{- end }}
In my values file I specified (along with the init and runner container).
template:
spec:
containers:
- name: dind
image: docker:dind
args:
- dockerd
- --host=unix:///var/run/docker.sock
- --group=$(DOCKER_GROUP_GID)
- --mtu=1400
which works for the default network, but dependabot creates it's own networks with no MTU setting, so it defaults to 1500 and dependabot breaks.
So that would fix the auto-created networks, but it won't help if you create docker networks as part of your actions.
I ended up using the solution discussed here, writing a deamon.json
configMap and mounting it inside the container to /etc/docker/daemon.json
.
This allow for setting
"bridge": {
"com.docker.network.driver.mtu": "1460"
which is also used for all networks created by actions.
I was going to update today, I saw that moby/moby#43197 has been merged (earlier this year/late last year) and that solves my issue by adding this argument --default-network-opt=bridge=com.docker.network.driver.mtu=1400
.
Now when dependabot calls the docker API (not using a shell, so the shims don't help) creating a network for the updater container it now has the MTU set to 1400.
template:
spec:
containers:
- name: dind
image: docker:dind
args:
- dockerd
- --host=unix:///var/run/docker.sock
- --group=$(DOCKER_GROUP_GID)
- --mtu=1400
- --default-network-opt=bridge=com.docker.network.driver.mtu=1400
From the dind
container in the dependabot runner pod.
$ docker network inspect dependabot-job-11050-external-network
Output (cut for size):
[
{
"Name": "dependabot-job-11050-external-network",
"Id": "dff4d1a3f843634c060258f5e808050ac9861ba487a0a0c677278506321374ea",
"Created": "2024-08-20T07:10:54.585512615Z",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": { ... },
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": { ... }
},
"Options": {
"com.docker.network.driver.mtu": "1400"
},
"Labels": {}
}
]
Maybe these two options (container args and ConfigMap) should be added to the docs, considering how many reactions this issue got?
same issues occures on older version (0.9.0).
curl -v https://github.com
fails on (1)HELLO
but curl -v --resolve github.com:443:140.82.121.3 https://github.com/
works.
and with proxy it works as well.
working workaround:
after this patch it works with 0.9.3.
any Idea why only github have this connectivity issue? what bug should be raised?
Checks
Controller Version
0.9.3
Deployment Method
ArgoCD
Checks
To Reproduce
Describe the bug
Output from step 4:
Describe the expected behavior
docker run
command above run correctly without SSL connection timeout errorAdditional Context
Controller Logs
Runner Pod Logs