codekitchen / dinghy

faster, friendlier Docker on OS X. Deprecated.
MIT License
2.12k stars 109 forks source link

DNS resolution broken over IPv6 #280

Open adamquaile opened 6 years ago

adamquaile commented 6 years ago

Hi,

I'm having an issue and after trying to investigate myself I've come to a dead-end.

A container in a local dev env was timing out on making some requests, and it seems to be due to an ipv6 issue.

This command has a delay of ~5s before returning results

docker run --rm -it alpine:3.7 ping google.com

whereas

docker run --rm -it alpine:3.7 ping -4 google.com

is almost instant.

I think this is because the DNS resolution is not working over ipv6.

$ docker run --rm -it alpine:3.7 sh -c 'apk add --no-cache bind-tools && dig google.com AAAA'
fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/community/x86_64/APKINDEX.tar.gz
(1/4) Installing libgcc (6.4.0-r5)
(2/4) Installing libxml2 (2.9.7-r0)
(3/4) Installing bind-libs (9.11.3-r0)
(4/4) Installing bind-tools (9.11.3-r0)
Executing busybox-1.27.2-r11.trigger
OK: 9 MiB in 17 packages

; <<>> DiG 9.11.3 <<>> google.com AAAA
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOTIMP, id: 8782
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 7dfd9da7da61a45f (echoed)
;; QUESTION SECTION:
;google.com.            IN  AAAA

;; Query time: 0 msec
;; SERVER: 10.0.2.3#53(10.0.2.3)
;; WHEN: Thu Sep 06 08:48:44 UTC 2018
;; MSG SIZE  rcvd: 51
$ docker run --rm -it alpine:3.7 sh -c 'apk add --no-cache bind-tools && dig google.com AAAA @1.1.1.1'
fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/community/x86_64/APKINDEX.tar.gz
(1/4) Installing libgcc (6.4.0-r5)
(2/4) Installing libxml2 (2.9.7-r0)
(3/4) Installing bind-libs (9.11.3-r0)
(4/4) Installing bind-tools (9.11.3-r0)
Executing busybox-1.27.2-r11.trigger
OK: 9 MiB in 17 packages

; <<>> DiG 9.11.3 <<>> google.com AAAA @1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 53178
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1452
;; QUESTION SECTION:
;google.com.            IN  AAAA

;; ANSWER SECTION:
google.com.     229 IN  AAAA    2a00:1450:4009:803::200e

;; Query time: 4 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Wed Sep 05 15:49:14 UTC 2018
;; MSG SIZE  rcvd: 67

Not sure why this is causing ping/curl to timeout, but this seems to be the root of my issue.

Any ideas on how I can fix or work around the issue?

Thanks a lot!

adamquaile commented 6 years ago

I should also say that I'm using the virtualbox VM (xhyve doesn't work for me, no licenses for others) and that I've tried the same steps with Docker for mac / without dinghy and it works as expected.

codekitchen commented 6 years ago

Hm I can't reproduce this on my machine using the xhyve backend, I'll have to set up a Virtualbox env when I get a chance. It looks like IPv6 DNS resolution is working fine, but my container doesn't even get an IPv6 address (though the host docker-machine VM does have an IPv6 address). Is there something specific you've done to give the container IPv6 support, or did that happen automatically for you? I'll admit I've never had the need for IPv6 in a docker container yet, so I haven't really looked at how that is configured, etc.

So it's possibly a problem specific to the Virtualbox backend, but I'm not sure why that'd be. If you dinghy ssh into the docker-machine VM and run dig and ping there outside of a container, do you see the same problem?

> docker run --rm -it alpine:3.7 sh -c 'apk add --no-cache bind-tools && dig google.com AAAA'
fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/community/x86_64/APKINDEX.tar.gz
(1/4) Installing libgcc (6.4.0-r5)
(2/4) Installing libxml2 (2.9.7-r0)
(3/4) Installing bind-libs (9.11.3-r0)
(4/4) Installing bind-tools (9.11.3-r0)
Executing busybox-1.27.2-r11.trigger
OK: 9 MiB in 17 packages

; <<>> DiG 9.11.3 <<>> google.com AAAA
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 43208
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;google.com.            IN  AAAA

;; ANSWER SECTION:
google.com.     11  IN  AAAA    2607:f8b0:400f:806::200e

;; Query time: 0 msec
;; SERVER: 192.168.64.1#53(192.168.64.1)
;; WHEN: Thu Sep 06 15:52:37 UTC 2018
;; MSG SIZE  rcvd: 67

> docker run --rm -it alpine:3.7 ping -6 google.com
PING google.com (2607:f8b0:400f:800::200e): 56 data bytes
ping: sendto: Address not available

> docker run --rm -it alpine:3.7 ifconfig
eth0      Link encap:Ethernet  HWaddr 02:42:AC:11:00:03
          inet addr:172.17.0.3  Bcast:0.0.0.0  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:110 (110.0 B)  TX bytes:0 (0.0 B)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
adamquaile commented 6 years ago

Is there something specific you've done to give the container IPv6 support, or did that happen automatically for you? I'll admit I've never had the need for IPv6 in a docker container yet, so I haven't really looked at how that is configured, etc.

I don't think I do have ipv6 support working:

$ docker run --rm -it alpine:3.7 ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 02:42:AC:11:00:03
          inet addr:172.17.0.3  Bcast:172.17.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:110 (110.0 B)  TX bytes:0 (0.0 B)

The reason I'm doing an IPv6 lookup is not particularly because I need that functionality - it's just that alpine linux seems to do an AAAA and A request regardless. There's other issues relating to that project if you search around, e.g. https://github.com/gliderlabs/docker-alpine/issues/153

I think this is normally not an issue so long as the AAAA request either succeeds or fails immediately.

So it's possibly a problem specific to the Virtualbox backend, but I'm not sure why that'd be. If you dinghy ssh into the docker-machine VM and run dig and ping there outside of a container, do you see the same problem?

I think it will be virtualbox specific. I have done some tests as you suggest and a bit of digging earlier and found a few interesting things.

From within the dinghy VM:

docker@dinghy:~$ dig google.com AAAA

; <<>> DiG 9.10.2 <<>> google.com AAAA
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOTIMP, id: 31314
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;google.com.            IN  AAAA

;; Query time: 0 msec
;; SERVER: 10.0.2.3#53(10.0.2.3)
;; WHEN: Thu Sep 06 16:29:13 UTC 2018
;; MSG SIZE  rcvd: 39

The IP it's using to do that lookup is 10.0.2.3 which I think is something Virtualbox sets up maybe related to its creation of a NAT network adapter.

I also tried to get around that issue by disabling that functionality, by running:

dinghy halt
VBoxManage modifyvm "dinghy" --natdnshostresolver1 off
VBoxManage modifyvm "dinghy" --natdnsproxy1 off
dinghy start

but that didn't seem to help.

This does seem to be the root of the issue though; if I update /etc/resolv.conf in the dinghy VM to 1.1.1.1 then the lookups succeed from inside the VM and inside the alpine containers too.

I understand maybe that's not a permanent solution (would that break *.docker resolution from inside other containers?) but if there's any other steps I can do to help diagnose it let me know.

Also, thanks for your support and the project in general. Having tried my own dnsmasq/nginx-proxy/vagrant hacks, it's a much better executed version of all that and I'm so close to being able to drop all my hacks and still have a nice dev environment on mac 👍

codekitchen commented 6 years ago

I'm able to repro the problem using the Virtualbox backend. I tried creating another VM directly with docker-machine create and couldn't repro there, so that narrowed it down to what Dinghy does during the set up phase.

If I remove the VBoxManage modifyvm "dinghy" --natdnshostresolver1 on command that Dinghy runs on setup, that fixes the issue (ping still fails of course, but it resolves google.com over IPv6 immediately). I'm not sure why it didn't fix it when you turned it off after the fact, there must be some sticky state somewhere.

Unfortunately, the whole reason we run that command is that resolving our *.docker addresses doesn't work without it, so I don't think it's an acceptable solution unless we find some other good way to make that work under Virtualbox.

I'm no expert at this stuff but I'll keep playing around with it a bit, see if I can find any configuration that fixes IPv6 DNS without breaking our resolver.

ImTheDeveloper commented 5 years ago

I seem to be hitting the same issue without vbox I tried to document it as well as I could here. Does this look to be an underlying issue? https://superuser.com/questions/1411111/docker-dns-resolution-slow-for-http-calls-in-node-application-ipv6-bridge-netw