golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.05k stars 17.68k forks source link

net: Dial does not respond to quickly-broken IPv6 connections by falling back to IPv4 #68237

Open oakad opened 4 months ago

oakad commented 4 months ago

Go version

go version go1.22.4 darwin/arm64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOOS='darwin'
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/opt/homebrew/Cellar/go/1.22.4/libexec'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/opt/homebrew/Cellar/go/1.22.4/libexec/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.22.4'
GCCGO='gccgo'
AR='ar'
CC='cc'
CXX='c++'
CGO_ENABLED='1'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/q9/qcgtwgsj0y72gr01_djqgmyw0000gq/T/go-build1826860007=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

Trying to fetch a random module (all break the same):

% go get nhooyr.io/websocket go package net: confVal.netCgo = false netGo = false go package net: using cgo DNS resolver go package net: hostLookupOrder(proxy.golang.org) = cgo go: module nhooyr.io/websocket: Get "https://proxy.golang.org/nhooyr.io/websocket/@v/list": write tcp [fe80::bed0:74ff:fe64:598e%utun4]:56330->[2a00:1450:4003:80c::2011]:443: write: socket is not connected

Machine has IPv6 disabled:

% dig proxy.golang.org ; <<>> DiG 9.10.6 <<>> proxy.golang.org ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 53713 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4000 ;; QUESTION SECTION: ;proxy.golang.org. IN A

;; ANSWER SECTION: proxy.golang.org. 46 IN A 142.250.184.177

;; Query time: 366 msec ;; SERVER: 10.20.141.5#53(10.20.141.5) ;; WHEN: Fri Jun 28 21:47:33 AEST 2024 ;; MSG SIZE rcvd: 61

What did you see happen?

Go get is unable to fetch a module because it's using a wrong proxy address.

What did you expect to see?

Go get should be able to fetch a module.

gabyhelp commented 4 months ago

Related Issues

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

seankhliao commented 4 months ago

That's not proof that IPv6 is disabled, only that dig defaults to an A (IPv4) query.

oakad commented 4 months ago

It is, I assure you. However, there's a caveat: we have a Cisco VPN which insists on advertising an additional resolver; the said resolver is able to resolve AAAA record ("Request A records, Request AAAA records"). Basically, I've got this config:

DNS configuration

resolver #1 search domain[0] : heh nameserver[0] : heh nameserver[1] : heh flags : Request A records, Request AAAA records reach : 0x00000002 (Reachable) order : 1

DNS configuration (for scoped queries) resolver #1 nameserver[0] : heh nameserver[1] : heh if_index : 15 (en0) flags : Scoped, Request A records reach : 0x00000002 (Reachable)

resolver #2 search domain[0] : heh nameserver[0] : heh nameserver[1] : heh if_index : 23 (utun4) flags : Scoped, Request A records, Request AAAA records reach : 0x00000002 (Reachable) order : 1

Still, go should not pick the AAAA address. Or, at least, it should not do so unconditionally, because I don't think our setup is uniquely broken. :-)

mateusz834 commented 4 months ago

From the output it is clear that the cgo resolver is being used, so out of our scope.

oakad commented 4 months ago

https://danp.net/posts/macos-dns-change-in-go-1-20/

This had started happening relatively recently and I believe it is caused by changes above.

mateusz834 commented 4 months ago

Can you try forcing the go resolver and see if it helps in your case? GODEBUG=netdns=go

oakad commented 4 months ago

How do I enable both this feature and dns debug so we can see it is used for real?

mateusz834 commented 4 months ago

GODEBUG=netdns=go+2

oakad commented 4 months ago

Tough luck:

% go get nhooyr.io/websocket go package net: confVal.netCgo = false netGo = true go package net: GODEBUG setting forcing use of Go's resolver go package net: hostLookupOrder(proxy.golang.org) = files,dns go: module nhooyr.io/websocket: Get "https://proxy.golang.org/nhooyr.io/websocket/@v/list": write tcp [fe80::bed0:74ff:fe64:598e%utun4]:57052->[2a00:1450:4003:80c::2011]:443: write: socket is not connected

oakad commented 4 months ago

For reference, curl does this:

% curl -v https://proxy.golang.org/nhooyr.io/websocket/@v/list

seankhliao commented 4 months ago

What if you pass --ipv6 to curl?

In theory go's network stack should also be doing fast fallback / dual stack ipv4 and ipv6

mateusz834 commented 4 months ago

So the tittle is incorrect, it resolves correctly, but it fails to connect to the server when ipv6 is unavail, right?

oakad commented 4 months ago

curl gets stuck when forced to use ipv6. It may be that despite underlying adapter has ipv6 disabled, the Cisco vpn client pretends it's got an ipv6 address on the utun interface. Yet it causes no issues anywhere, everything works fine apart from go.

% curl -v --ipv6 https://proxy.golang.org/nhooyr.io/websocket/@v/list

oakad commented 4 months ago

The address is of course correct, it's the action of resolving the AAAA and sticking to it rather than resolving A is incorrect. :-)

rsc commented 4 months ago

From the discussion so far, it sounds like:

  1. Your Mac is configured with IPv6 enabled (that is, IPv6 sockets can be created successfully).
  2. Your DNS resolver is responding to AAAA requests with IPv6 addresses.
  3. Go looks up proxy.golang.org and gets both IPv6 and IPv4 addresses.
  4. Go connects to one of the IPv6 addresses seemingly successfully. Specifically, it does the connect and then runs getsockopt(fd, SOL_SOCKET, SO_ERROR) in net/fd_unix.go and gets syscall.EISCONN, which makes it return from Dial.
  5. A future write on that connection gets syscall.ENOTCONN, as shown in the error messages.

Normally, when IPv6 addresses can't be used, the connect never succeeds (fails or times out). In your case, it appears that the connect is succeeding but then the connection breaks very quickly after that, perhaps on the first write.

Do you know of anything strange about your Mac's network or IPv6 configuration? Or some firewall that is actively breaking IPv6 connections?

For example on my Mac:

% host proxy.golang.org
proxy.golang.org has address 142.250.65.177
proxy.golang.org has IPv6 address 2607:f8b0:4006:80e::2011
proxy.golang.org mail is handled by 40 alt4.gmr-smtp-in.l.google.com.
proxy.golang.org mail is handled by 10 alt1.gmr-smtp-in.l.google.com.
proxy.golang.org mail is handled by 5 gmr-smtp-in.l.google.com.
proxy.golang.org mail is handled by 30 alt3.gmr-smtp-in.l.google.com.
proxy.golang.org mail is handled by 20 alt2.gmr-smtp-in.l.google.com.
% sudo route add -inet6 2607:f8b0:4006:80e::2011 ::1
add host 2607:f8b0:4006:80e::2011: gateway ::1
% go mod download -json rsc.io/markdown@latest
{
    "Path": "rsc.io/markdown",
    "Version": "v0.0.0-20240617154923-1f2ef1438fed",
    "Query": "latest",
    "Info": "/Users/rsc/pkg/mod/cache/download/rsc.io/markdown/@v/v0.0.0-20240617154923-1f2ef1438fed.info",
    "GoMod": "/Users/rsc/pkg/mod/cache/download/rsc.io/markdown/@v/v0.0.0-20240617154923-1f2ef1438fed.mod",
    "Zip": "/Users/rsc/pkg/mod/cache/download/rsc.io/markdown/@v/v0.0.0-20240617154923-1f2ef1438fed.zip",
    "Dir": "/Users/rsc/pkg/mod/rsc.io/markdown@v0.0.0-20240617154923-1f2ef1438fed",
    "Sum": "h1:savaUwUp0YCIxdaF9EFOMB3j+TQnoLop+cNp2KPC9jk=",
    "GoModSum": "h1:rzOcjAz36Xzvwf6iaJSYXkmNbvu5XHelis1egIN0Cys="
}
% curl -v --ipv6 https://proxy.golang.org
* Host proxy.golang.org:443 was resolved.
* IPv6: 2607:f8b0:4006:80e::2011
* IPv4: (none)
*   Trying [2607:f8b0:4006:80e::2011]:443...
^C
% sudo route delete -inet6 2607:f8b0:4006:80e::2011 
delete host 2607:f8b0:4006:80e::2011
% curl -v --ipv6 https://proxy.golang.org
* Host proxy.golang.org:443 was resolved.
* IPv6: 2607:f8b0:4006:80e::2011
* IPv4: (none)
*   Trying [2607:f8b0:4006:80e::2011]:443...
* Immediate connect fail for 2607:f8b0:4006:80e::2011: No route to host
* Failed to connect to proxy.golang.org port 443 after 3 ms: Couldn't connect to server
* Closing connection
curl: (7) Failed to connect to proxy.golang.org port 443 after 3 ms: Couldn't connect to server
% 
oakad commented 4 months ago

The problem only happens with VPN enabled, I mentioned it before. The VPN in question is Cisco secure client, aka AnyConnect. I'm working with people who manage the Cisco VPN for us to see if they can change anything on their side (AnyConnect is supposed to be server side controlled, so not much can be done on the client side).

  1. Only Go breaks on our current setup; all other applications seem to work just fine. Go used to work previously, it only started breaking relatively recently (may be caused by 1.20 changes or by some changes to AnyConnect setup).
  2. Go can be made to work by using ifconfig to erase IPv6 addresses from the utun device in use by AnyConnect. This, however, has to be done on any VPN reconnection (due to how AnyConnect works).
rittneje commented 2 months ago

@rsc I get the same issue when trying to install things using 1.22.6 on my MacBook while on our corporate VPN (which is also Cisco AnyConnect).

My testing reveals there are two underlying issues:

  1. The IPv4 dial (which, contrary to the documentation, actually happens first, see #68795) takes longer than 300 milliseconds to complete.
  2. Somehow Go believes the IPv6 dial works even though it clearly didn't really. (Kernel bug?)

Increasing the dialer's FallbackDelay (or making it negative) is enough to resolve the issue, but I have no control over what go install is doing. Would it be possible to allow overriding the 300 ms default via some env var?