Closed Rotonen closed 2 years ago
We did see issues, yes. Several users with VPNs configured had problems with multiple incorrect nameservers being attempted.
From reading the macOS man 5 resolver
pages, it seems macOS has a “super resolver” that has a much better idea of which nameservers to query. In other words, only reading /etc/resolv.conf
for nameservers wasn’t quite enough.
Does https://go-review.googlesource.com/c/go/+/166297 being merged close this issue?
@timfallmk No, the fix was removed in https://go-review.googlesource.com/c/go/+/180843/6
That's unfortunate. So this remains an ongoing issue?
@timfallmk Until the Go team is willing to reverse that CL, yes.
I build Darwin binaries from Linux based CI/CD pipelines.
In the apps I produce for my customers, I have had to resort to unspeakable hacks involving replacing the default resolver, with something that "speaks DNS" (since the resolver has no interface{}); and to follow what scutil
shows (much like /etc/resolv.conf points people to). @JohnStarich 's post above is a similar module. This method is a terrible hack, and it only works in apps that explicitly load the module. (No offense to John or his code!)
Resorting to making releases from a laptop is pretty undesirable. Despite this, for some tools, this has had to happen anyways - because home-brew binaries of important projects don't work in the VPN case. kubectl
is a great example.
I'd like to see us get to the point where Go binaries "just work", even when cross compiled, without having to incorporate resolver hacks like John's into all the Go apps our customers need on their laptops. Whether that means resolving (hah) the issues with the system resolver, or better understanding of the system's resolver configuration (including routing of queries by domain to alternate server(s)).
As a developer shipping binaries to customers, this is my single biggest Go pain point today.
I don't know if by "home-brew" you meant https://brew.sh/, but my experience has been packages built that way work as long as cgo hasn't been explicitly disabled, which was in the past, in the case of kubectl. I don't know what the current state is, but they claim to have fixed it without any explanation of where.
Of course this is of no help for cross-compiled binaries. I agree that it's a Big Problem, and it seems pretty clear just expecting every person shipping Go binaries to "just deal with it" isn't working.
I have had to resort to unspeakable hacks involving replacing the default resolver
I've been able to avoid the unspeakable hacks when cross-compiling by using xgo from its docker image, but xgo
hasn't been updated in a long while now and seems to be unmaintained.
@bitglue if you compile locally using brew, yes. But if the provider wants to avoid any go mod issues and provide the binary by building on circle ci or something, you can't do because of this. The resulting binary will not respect OSX resolvers, especially on corporate VPNs.
Earlier on I described how the Terraform team at HashiCorp uses cross-compilation to build for multiple platforms, and thus our darwin_amd64
releases end up not having cgo enabled.
In the interests of keeping things current, I just wanted to note that in conjunction with reworking our build process to support other macOS-specific chores such as notarization and building for the new darwin_arm64
platform we have now abandoned cross-compilation as our build strategy for the official Terraform releases, and thus this problem is no longer such a high pressure for Terraform users who choose to use the binaries we produce.
However, there are still various people who for one reason or another build Terraform from source themselves, and they can unknowingly end up producing a build that has significantly different DNS resolution behavior than the official builds depending on what strategy they take to do so. Also, the Terraform artifacts we produce are used in conjunction with a variety of other plugin executables, many of which are built by teams at other companies who have devised their own release strategies which may or may not be using cross compilation or enabling cgo, and so this situation does still remain somewhat problematic for some smaller cohorts in our community.
Overall it feels unfortunate to me that the DNS resolution behavior of Go programs can vary so drastically depending on how they are built, whereas in most other regards the Go toolchain does a great job of ensuring consistency regardless of strategy. I think a number of Go-based codebases have gone through this sequence of being naive to that possibility, and then learning too late that this difference exists as a result of a likely-frustrating debugging session where the root cause exists outside of the developers' typical field of view.
With that said, I do understand that there isn't a super clear answer here, since DNS resolution behavior for Unix-alike systems has traditionally been the responsibility of a system's C library. If there isn't a viable way for Go to exhibit the correct DNS resolution behavior without interacting with libc using cgo then I hope instead for some inspiration on how to make this quirk more visible to developers, so they aren't given false optimism by how well the Go toolchain typically handles cross-compilation.
If there isn't a viable way for Go to exhibit the correct DNS resolution behavior without interacting with libc using cgo then I hope instead for some inspiration on how to make this quirk more visible to developers, so they aren't given false optimising by how well the Go toolchain typically handles cross-compilation.
Would it be possible for the compiler to emit a warning when, say, cross-compiling for macOS on Linux?
I wanted to add one further update here, in case it's useful to anyone who is referring to this issue while trying to debug resolver-related problems for macOS builds:
By default, the Go toolchain automatically disables CGo when cross-compiling, which means that if you're running something like GOARCH=arm64 go build
on a darwin_amd64
Go toolchain in the hope of cross-compiling for Apple Silicon you will, by default, produce an executable with similar incorrect resolver behavior as would arise when e.g. cross-compiling for macOS from Linux.
However, in our experimentation so far we've found that there's sufficient support in the development tools for macOS x86_64 to generate an Apple Silicon binary that can dynamically link with libc at runtime -- I assume, but have not verified, that this works due to there being suitable libc headers for both targets in the system toolchain -- and so it seems to work to force-enable CGo when cross-compiling between architectures as long as both host and target are macOS (darwin
, in Go target terms):
CGO_ENABLED=1 GOARCH=arm64 go build
This can be particularly useful when you are building on a build automation platform that only offers amd64 workers, but still need to produce Apple Silicon binaries. I hope that's helpful to folks who are monitoring this issue due to having been burned by this before! :grinning:
@grantseltzer I don't know if this issue is still on your radar, but if it is, it doesn't seem like the issue here is just about convincing the Go maintainers to accept this functionality. There are some important issues called out in the revert that are holding up re-introducing this code, most notably that it is calling the wrong function in the wrong library. If you could fix that issue then I think that would go a long way toward getting this fixed once and for all.
Can anyone please clarify if they are actively working on this issue by replying to this thread?
Because after 7 years, I worry that this extremely important issue has no one actively working on it.
EDIT: the only PR I can see that mentions this issue is https://github.com/golang/go/pull/30686, but what was closed in April 2019
I don't think that anybody is actively working on this.
For those who are following this because they are releasing binaries cross-compiled to macOS from non-macOS host systems, the related issue #52839 may be of interest:
It seems that (for reasons not yet fully understood) the Go DNS resolver is able to globally break the system's ability to do DNS resolution at all when running on an Apple Silicon platform with an IPv6 address listed in /etc/resolv.conf
. If you hear from your users of strange hangs and misbehavior on Apple Silicon, that issue could potentially represent the problem and so you might wish to follow that issue too.
@bradfitz @ianlancetaylor
This issue is one of the biggest challenges to locally run CLI tools/programs using Go that works "as people expect". End users do not care why things don't work as they expect and the authors of general purpose tools are in a poor position to fix issues with DNS resolution on their users' networks and machines. Like with the Go DNS issues in #51127, #21160, #44135, the list of companies and their users impacted is astounding. Looking over the backlinks to this issue, it's a real who's-who of companies adopting and evangelizing the use of Go for CLIs.
Would it be possible for the Go team to return to this issue?
As a non-macOS user, I don't think I'm in a position to contribute or test a CL to fix up #30686 (CL 166297).
This screenshot, which I'll reproduce in text for accessibility, is a fantastic example. This is from an issue on the kubernetes repository in which the authors of the PR note that setting CGO_ENABLED=1
will break for some users, but it fixes it for more so they're willing to accept the tradeoff. Here are some backlinks on that one issue:
Screenshot shows that the following issues mentioned Build kubectl binary with cgo enabled kuberentes/release#469:
That's not an exhaustive list, it's just one page of backlinks from other open source projects to a different open source project, all of which boil down to this issue.
Retitled to make clear that this bug is about the resolver behavior on macOS when cgo is disabled. Pending CL 446178 will change package net to use libc directly, even when cgo is disabled. That should take care of this problem and maybe some others.
Change https://go.dev/cl/446178 mentions this issue: net: use libc (not cgo) for DNS on macOS
Seven years, but we got there. Thanks for all the hard work everyone!
Edit: Math is hard
Thank you @rsc, @ianlancetaylor, @bradfitz for closing this issue!
I am overjoyed to deliver the good news to our team and to share it with other maintainers of open source tools. This was the most significant obstacle to building tools that work for every user Go, I can't wait for the release that lands this CL!
@rsc Will the next release to include this be 1.20 or 1.19.4?
1.20
Would it be possible to consider this for cherry-pick or an env var to enable during compilation or runtime in 1.19.x?
I believe the 1.20 release would ordinarily occur in February to March of next year (given the 6 month release cycle). If an env var enabled this behavior sooner, I believe the Pulumi CLI and provider binary ecosystem would be early adopters.
I don't think this meets the bar for backports https://github.com/golang/go/wiki/MinorReleases.
Use tip or cherry pick onto your own fork if you need it earlier.
Reading the docs, it appears that interested parties may make a suggestion to backport an issue when a workaround is untenable. For Pulumi, we produce many dozens of binaries and imposing cross-compilation costs (& other cgo behavior changes) is not viable. In addition, we produce guidance for our customers & users to produce their own provider binaries in Go and it imposes a cost on them. (And not every CI tool is as easy to produce CGO binaries on macOS as GitHub Actions.)
I don't want to overstep however, it looks like most uses of the gopherbot to create the backport issue have been by Go maintainers & per the note you shared, "only the authors of the original CL have the ability to create the cherry-pick."
@rsc Would it be possible to consider this for backport given the aforementioned reason?
Though if the go 1.20 beta 1 is anticipated soon (last year the 1.18 beta was available in December), we may consider using that and recommending it in the interim. The goal is to mitigate issues like pulumi/pulumi-aws#2185 and I believe that this CL may fix other macOS name resolution bugs such as https://github.com/golang/go/issues/52839.
This is a risky change that's going to need some significant testing and soak time. It's not appropriate for a backport to a minor release. I'm sorry.
We expect the first release candidate for 1.10 to be available in early December.
Appreciate that, I totally understand the need for this to be well vetted.
@AaronFriel, Pulumni can make its own fork of Go with this change cherry-picked in to build your CLI binaries.
We maintain our own fork of Go fo @tailscale at https://github.com/tailscale/go which over the past 2.5 years has fluctuated from a few to a few dozen patches, as needed. It lets us move at our own pace, without worrying about Go release cycles.
Feel free to copy our infrastructure for doing so. Or contact me directly if you'd like.
Great to see this fixed, thanks!
I think the build tags need a slight adjustment. When testing it out by building for darwin/arm64 on linux/amd64 I was still getting go package net: built with netgo build tag; using Go's DNS resolver
from GODEBUG=netdns=1
running this program.
I think this diff is needed:
diff --git a/src/net/cgo_unix_syscall.go b/src/net/cgo_unix_syscall.go
index 7170f14c46..6b146e58b1 100644
--- a/src/net/cgo_unix_syscall.go
+++ b/src/net/cgo_unix_syscall.go
@@ -2,7 +2,7 @@
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
-//go:build cgo && !netgo && darwin
+//go:build !cgo && !netgo && darwin
package net
diff --git a/src/net/netgo.go b/src/net/netgo.go
index 75baa88035..3bfb2ee86c 100644
--- a/src/net/netgo.go
+++ b/src/net/netgo.go
@@ -4,9 +4,9 @@
// Default netGo to true if the netgo build tag is being used, or the
// C library DNS routines are not available. Note that the C library
-// routines are always available on Windows.
+// routines are always available on Windows and Darwin.
-//go:build netgo || (!cgo && !windows)
+//go:build netgo || (!cgo && !windows && !darwin)
package net
With that, I was able to build for darwin/arm64 on linux/amd64 and get go package net: using cgo DNS resolver
. I was also able to build directly on my darwin/arm64 system with CGO_ENABLED=0
and still have it use the cgo resolver.
Happy to open a CL if that sounds right.
There may be something to do here but I don't see how change could be correct. If the cgo
build tag is not enabled, we will build net/cgo_stub.go, and the direct calls to getaddrinfo
won't occur. Or so it seems to me.
You are quite right! The debug info seemed correct but it was hitting the stub cgoLookupHost and falling back to the Go resolver.
Perhaps this is getting closer. With it, along with some print debugging in cgo_unix.go's cgoLookupHost, I can build on my darwin/arm64 system with CGO_ENABLED=0
and 1
and see that it's getting results there. I'm also able to build in both CGO_ENABLED
modes on a linux/amd64 system and have things still build and seem to work.
diff --git a/src/net/cgo_bsd.go b/src/net/cgo_bsd.go
index 1456289b06..082e91faa8 100644
--- a/src/net/cgo_bsd.go
+++ b/src/net/cgo_bsd.go
@@ -2,7 +2,7 @@
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
-//go:build cgo && !netgo && (darwin || dragonfly || freebsd)
+//go:build cgo && !netgo && (dragonfly || freebsd)
package net
diff --git a/src/net/cgo_darwin.go b/src/net/cgo_darwin.go
new file mode 100644
index 0000000000..af0217b6b4
--- /dev/null
+++ b/src/net/cgo_darwin.go
@@ -0,0 +1,5 @@
+package net
+
+import "internal/syscall/unix"
+
+const cgoAddrInfoFlags = (unix.AI_CANONNAME | unix.AI_V4MAPPED | unix.AI_ALL) & unix.AI_MASK
diff --git a/src/net/cgo_stub.go b/src/net/cgo_stub.go
index 298d829f6f..c901d4bb80 100644
--- a/src/net/cgo_stub.go
+++ b/src/net/cgo_stub.go
@@ -2,7 +2,7 @@
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
-//go:build !cgo || netgo
+//go:build (!cgo && !darwin) || netgo
package net
diff --git a/src/net/cgo_unix.go b/src/net/cgo_unix.go
index a944727338..415a2b1904 100644
--- a/src/net/cgo_unix.go
+++ b/src/net/cgo_unix.go
@@ -7,7 +7,7 @@
// Instead of C.foo it uses _C_foo, which is defined in either
// cgo_unix_cgo.go or cgo_unix_syscall.go
-//go:build cgo && !netgo && unix
+//go:build !netgo && (cgo || darwin)
package net
diff --git a/src/net/cgo_unix_syscall.go b/src/net/cgo_unix_syscall.go
index 7170f14c46..c5c27967b1 100644
--- a/src/net/cgo_unix_syscall.go
+++ b/src/net/cgo_unix_syscall.go
@@ -2,7 +2,7 @@
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
-//go:build cgo && !netgo && darwin
+//go:build !netgo && darwin
package net
diff --git a/src/net/netgo.go b/src/net/netgo.go
index 75baa88035..3bfb2ee86c 100644
--- a/src/net/netgo.go
+++ b/src/net/netgo.go
@@ -4,9 +4,9 @@
// Default netGo to true if the netgo build tag is being used, or the
// C library DNS routines are not available. Note that the C library
-// routines are always available on Windows.
+// routines are always available on Windows and Darwin.
-//go:build netgo || (!cgo && !windows)
+//go:build netgo || (!cgo && !windows && !darwin)
package net
Probably best to discuss further on a patch submission. Thanks.
Change https://go.dev/cl/448020 mentions this issue: net: adjust build tags for darwin libc calls
@bradfitz I may reach out, thank you! I'm not sure if we're ready to take on maintaining a fork, but you've piqued my interest.
https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man5/resolver.5.html
OS X allows you to add TLD specific resolver configurations. Quite popular ones are /etc/resolver/vm for local virtual machines and /etc/resolver/dev for local development purposes.
https://golang.org/src/net/dnsclient_unix.go#L231
Go seems to be hardcoded to only take /etc/resolv.conf into account on Unix platforms.