Closed johannmayer closed 12 months ago
are you behind a VPN connection?
Yes, i am behind a corporate VPN connection.
I am not on a VPN or using docker with colima
, but I see a similar issue:
I get a DNS related error on my first build with nerdctl
via containerd after I have started the alpine VM.
Simply re-running the command fixes things until I restart the VM.
$ nerdctl build --namespace k8s.io --platform linux/amd64 -t test/test:local -f ./Dockerfile .
[+] Building 0.2s (4/4) FINISHED
...
error: failed to solve: alpine:latest: failed to do request: Head "https://registry-1.docker.io/v2/library/alpine/manifests/latest": dial tcp: lookup registry-1.docker.io on [::1]:53: read udp [::1]:45220->[::1]:53: read: connection refused
FATA[0000] unrecognized image format
FATA[0000] exit status 1
Second Try:
$ nerdctl build --namespace k8s.io --platform linux/amd64 -t test/test:local -f ./Dockerfile .
[+] Building 0.2s (4/4) FINISHED
...
[+] Building 9.7s (7/17)
=> [internal] load build definition from Dockerfile 0.1s
=> => transferring dockerfile: 580B 0.1s
=> [internal] load .dockerignore 0.1s
=> => transferring context: 306B 0.1s
=> [internal] load metadata for docker.io/library/alpine:latest 0.4s
=> [internal] load metadata for docker.io/library/golang:1.17
...
I am running into the same error, without any VPN connection.
❯ colima version
colima version 0.3.2
git commit: 272db4732b90390232ed9bdba955877f46a50552
runtime: docker
arch: aarch64
client: v20.10.10
server: v20.10.11
I resolved it by doing colima start --port-interface 127.0.0.1
Correction: colima start --port-interface 127.0.0.1 -s
but it fails after pulling in more data
For those of us behind a VPN, how do I configure docker to use a proxy?
This is a good overview of DNS issues in Alpine and might be at the core of some of these DNS issues:
Their main fix was to migrate to RedHat's Universal Base Images (UBI) - https://developers.redhat.com/products/rhel/ubi
There is a workaround as well, that I will try when I have a bit of time to test it.
I am seeing this issue now too, after it had been working for me initially, e.g. -
% docker pull lscr.io/linuxserver-labs/daedalos
Using default tag: latest
Error response from daemon: Get "https://ghcr.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
and testing on multiple networks.
Same here docker pull hello-world Using default tag: latest error during connect: Post "http://%2FUsers%2Fxxxxxx%2F.colima%2Fdocker.sock/v1.41/images/create?fromImage=hello-word&tag=latest": EOF
Hello, I have this error too : Error response from daemon: Get "https://registry-1.docker.io/v2/": dial tcp: lookup registry-1.docker.io on 192.168.5.3:53: read udp 192.168.5.15:33676->192.168.5.3:53: i/o timeout Sometimes it's a timeout, sometimes another error.
I to install it on a macOS without VPN whatsoever, I don't understand the issue. I've also tested multiple configuration like Rancher desktop, minikube + hyperkit, podman etc and I have this issue only with Colima.
Someone found a solution about that ?
For instance if I run docker run hello-word it's working for almost 30 secondes after the start of colima. And then it crashes and I finally get this error. After that the error happen every times
It's Alpine. The musl DNS resolver is pretty terrible. It behaves differently from glibc in many ways.
It's Alpine. The musl DNS resolver is pretty terrible. It behaves differently from glibc in many ways.
I am just realising this
There are details about this here: https://wiki.musl-libc.org/functional-differences-from-glibc.html#Name-Resolver/DNS
I've been experiencing DNS failures randomly too. Especially, when having many queries in quick succession. Would having a caching dns server sit between the qemu dns and the containers help? I may try to set one up manually to see if it helps the situation.
I'm not convinced the differences between glibc and musl are the root cause here; unless colima does something different, there should be only a single nameserver in /etc/resolv.conf
, and it should point to the lima internal host resolver.
I found one bug with this very recently: we disable IPv6 lookups in Lima by default because they often end up not working. The issue was though that instead of responding with an empty response, we handed the request to the resolver on the host, which might then add some random error for the IPv6 query to our response.
In my specific test case, I got the right DNS information when I looked with nslookup
or dig
, but curl
could not connect. So I guess the musl resolver could share some blame, but the main blame belongs on our own DNS implementation (at least for this particular case).
This should be fixed in the forthcoming lima 0.8.3 release. So I would appreciate if you could all re-test with that version (once released), and report back if this improved/fixed the situation!
I'm not convinced the differences between glibc and musl are the root cause here; unless colima does something different, there should be only a single nameserver in /etc/resolv.conf, and it should point to the lima internal host resolver.
This is the case in Colima as well, and the single nameserver is 192.168.5.3
.
This should be fixed in the forthcoming lima 0.8.3 release. So I would appreciate if you could all re-test with that version (once released), and report back if this improved/fixed the situation!
Looking forward to it. Thanks.
New colima user here, running into this right off the bat. lima version is 0.8.3, colima 0.3.3. This workaround fixed it for me: https://github.com/abiosoft/colima/issues/140#issuecomment-1028395976
I'm not convinced the differences between glibc and musl are the root cause here; unless colima does something different, there should be only a single nameserver in /etc/resolv.conf, and it should point to the lima internal host resolver.
This is the case in Colima as well, and the single nameserver is
192.168.5.3
.This should be fixed in the forthcoming lima 0.8.3 release. So I would appreciate if you could all re-test with that version (once released), and report back if this improved/fixed the situation!
Looking forward to it. Thanks.
@abiosoft Do we need to wait for a colima release for this? Running colima 0.3.3, and lima 0.8.3.
I experience this error:
Unable to connect to the server: dial tcp: lookup private.hostname.from.internal.company.com on 192.168.5.3:53: read udp 172.17.0.2:34738->192.168.5.3:53: i/o timeout
When I go into the VM:
dnn@overwatch ~ » colima ssh
colima:/Users/dnn$ nslookup private.hostname.from.internal.company.com
;; connection timed out; no servers could be reached
This happens because I'm running a script that is doing the same lookup over and over again very quickly. If I stop for a few minutes and try again, the DNS lookup is okay.
@pedantic79 a lima upgrade should be all that is required.
For troubleshooting purposes, can you kindly try this https://github.com/abiosoft/colima/issues/140#issuecomment-1028395976 and see if the behaviour is different? Note that it requires recreating the VM to see the effect i.e. colima delete
(if exits) prior to starting.
I also faced the same issue but its resolved by specifying DNS resolver
colima start --dns 1.1.1.1
@abiosoft Yes that seems to fix things. I ended up using 192.168.5.2
, the host, since work runs a dns proxy on my laptop. This way I can resolve private addresses not on the public DNS.
Can anyone try the lastest development version and see if anything changes?
brew install --HEAD colima
Nope. A reasonable test for me is to download a large-ish (~1.5 GB) image:
docker image rm localstack/localstack
docker pull localstack/localstack:latest
which will get part of the way through and then stall:
Using default tag: latest
latest: Pulling from localstack/localstack
69bf0018a85c: Pull complete
d99d2ad45cad: Pull complete
2f5e7e852b75: Pull complete
9bdba4da0515: Pull complete
6d148a48367a: Pull complete
4f136f6bab8f: Pull complete
abd3b9714a4d: Pull complete
50eebec84093: Pull complete
a7f30185d16d: Pull complete
a0e7ef63792a: Pull complete
6e070eb76685: Pull complete
6fb969c1cc11: Pull complete
6b72ad47a399: Pull complete
5a968b0e80e9: Pull complete
4f4fb700ef54: Pull complete
f7deb66a5a33: Pull complete
318d55565698: Pull complete
565ac449cbaa: Pull complete
973b9108c62f: Pull complete
abe7f386e549: Pull complete
6af74865c5fb: Pull complete
b4ff06af1df8: Pull complete
b93bdfca7413: Pull complete
6e0f2f6fe87b: Pull complete
348542de0a59: Pull complete
338328b1acd7: Pull complete
343ae7575c43: Retrying in 1 second
ecaf8f60df9e: Retrying in 1 second
c01474015845: Retrying in 1 second
31c659c48f0f: Waiting
b146a65269aa: Waiting
b19b566fb94a: Waiting
and subsequent attempts:
Error response from daemon: Get "https://registry-1.docker.io/v2/": dial tcp: lookup registry-1.docker.io on 192.168.5.3:53: read udp 192.168.5.15:56456->192.168.5.3:53: i/o timeout
making me wonder if I am getting throttled or running out of sockets or something.
Using docker desktop this pull is a breeze.
@navels l'd be interested in knowing if there are any specifics to your network connection as I am struggling to reproduce this.
I do get Retrying in x secs
once in a while but the retries are successful and it never gets bad enough for the image pulling to terminate.
Can you kindly share the output of colima version
?
Thanks.
@abiosoft I'm seeing the same timeout and lookup failure as @navels, only in my case it was triggered by pushing a number of images in quick succession instead of pulling a single large one. I've confirmed that docker pull localstack/localstack:latest
often fails with endless retry messages for me as well.
% colima version
colima version HEAD-5e2e413
git commit: 5e2e41310e595553dcdc29ba45827d4030af37bb
Other details that might be helpful:
colima stop; colima start
resolves the issue temporarily, allowing name lookups to complete again until another large push/pullPing output from within the VM used to be very strange with a constantly increasing round trip and DUP packets, but that appears to be fixed in this latest version. 👍
> colima version
colima version HEAD-5e2e413
git commit: 5e2e41310e595553dcdc29ba45827d4030af37bb
runtime: docker
arch: aarch64
client: v20.10.13
server: v20.10.11
I have this problem at home and at work, on and off VPN. This is on an M1 Mac Pro. Network speeds are about the same at both locations: ~300 Mbps.
Aha . . . I just tried a few different configurations and it seems to happen with more CPUs. With 1-2 CPUs I didn't have any issues. With 3 I do. My normal configuration is 8 CPUs.
Double-checked my docker desktop config: 8 CPUs.
I’ve ran into these DNS issues too and I’ve found changing my DNS to use the gateway of the VDE network works well for me. If you want to see if this workaround will work for you too, try running the following before your test:
colima ssh -- sudo sh -c 'echo nameserver 192.168.106.1 > /etc/resolv.conf'
This temporary patch can be reverted by restarting colima or running the above again with 192.168.5.3
. I have the following in ~/.lima/_config/override.yaml
to make this change persistent:
useHostResolver: false
dns:
- 192.168.106.1
Yep, yep, there are workarounds, just trying to help @abiosoft troubleshoot.
I am also still seeing issues with the use case that I reported in https://github.com/abiosoft/colima/issues/137#issuecomment-1018721366
The first time I run something like:
nerdctl build --namespace k8s.io --platform linux/amd64 -t test/test:local -f ./Dockerfile .
it fails with:
After another one or two tries (so likely after some short amount of time from the first attempt) it works and then continues to work.
@spkane can you try the last development version brew install --head colima
and see if that improves anything?
@navels you likely weren't running colima with vde networking enabled as the fix for m1 devices just got pushed.
Can you try installing again brew install --HEAD colima
and get rid of /opt/colima
with sudo rm -rf /opt/colima
.
Does that change anything?
Unfortunately no change, fails with 3 CPUs.
colima version HEAD-3fc20b2
@navels are you able to see the IP address in the output of colima ls
?
Yep: 192.168.106.2
@abiosoft The latest HEAD has much more stable network on apple M1 CPU, with 4 cores enabled, although wrong DNS issue is still present.
colima version HEAD-37a6de0
git commit: 37a6de0ef4fe631c7b34e69697c5234a9cdd5541
runtime: docker
arch: aarch64
client: v20.10.14
server: v20.10.11
Does anyone have Cisco AnyConnect installed?
I have an intel mac that I just upgraded from Catalina to Monterey.
Since the upgrade, I've been experiencing various network timeouts, but the dns issues in colima were the most pronounced as they blocked my use of docker pull
. Outside of Colima, git was often hanging as well, so I didn't think it was a uniquely colima issue, so I kept looking after I found this issue.
I have Cisco AnyConnect installed which I occasionally use to connect to a VPN. After the Monterey update, "Cisco AnyConnect Socket Filter" showed up and asked for permission to run a new SystemExtension. I allowed it at that point, but I think that was the culprit behind all my network issues. Here are some other issues people experienced with it: https://apple.stackexchange.com/questions/420773/the-process-com-cisco-anyconnect-macos-acsockext-hogs-mac-cpu-but-cannot-be-kill
This service is suspicious (to me) because its "features" are (based on the docs):
So, I just deleted Cisco AnyConnect Socket Filter (deleted it from the Applications) which removed the SystemExtension. And, I stopped its annoying "notification" service from pestering me about it on reboot.
$ launchctl blame cisco
// this prints a list the services. You want the gui/...cisco.anyconnect.notification... one.
$ launchctl disable gui/<number>/application.com.cisco.anyconnect.notification.<number>.<number>
$ launchctl stop gui/<number>/application.com.cisco.anyconnect.notification.<number>.<number>
$ launchctl kill 9 gui/<number>/application.com.cisco.anyconnect.notification.<number>.<number>
After doing all of that (and another reboot), dns works in colima again!
I stopped using colima a while ago but just tried this again and am not getting the errors, so either fixed in colima or the Mac networking stack (Sonoma on an M1 Pro).
Hi,
i just installed colima on a MacBook Pro wit BigSur 11.6.2
When i want to pull in docker, I get an i/o timeout error. It seems that the colima system doesn't have internet connection.
docker pull maven Using default tag: latest Error response from daemon: Get "https://registry-1.docker.io/v2/": dial tcp: lookup registry-1.docker.io on 192.168.5.3:53: read udp 192.168.5.15:56157->192.168.5.3:53: i/o timeout
Are there any post-install steps to get a connection?