Closed Rotonen closed 1 year ago
I don't think Go-native DNS resolving mechanism is used on Mac. https://golang.org/src/net/dnsclient_unix.go#L231 is not executed if I run
addrs, err := net.LookupHost("google.com")
on my Mac.
If I enable debugging (GODEBUG=netdns=2 go run test.go
), the following is printed:
go package net: using cgo DNS resolver
go package net: hostLookupOrder(google.com) = cgo
which means that OS-native DNS resolving is used.
Can you supply an exact configuration file, Go code, actual and expected output?
@nodirt This is for a binary with cgo off.
If cgo is disabled then the pure go DNS resolver will be used. If you want to use the Mac DNS resolver, plese build with cgo.
On Mon, 7 Sep 2015 07:47 Jonathan Rudenberg notifications@github.com wrote:
@nodirt https://github.com/nodirt This is for a binary with cgo off.
— Reply to this email directly or view it on GitHub https://github.com/golang/go/issues/12524#issuecomment-138128691.
Shouldn't be a problem since this is needed only on a dev machine.
On Sun, Sep 6, 2015 at 4:06 PM Dave Cheney notifications@github.com wrote:
If cgo is disabled then the pure go DNS resolver will be used. If you want to use the Mac DNS resolver, plese build with cgo.
On Mon, 7 Sep 2015 07:47 Jonathan Rudenberg notifications@github.com wrote:
@nodirt https://github.com/nodirt This is for a binary with cgo off.
— Reply to this email directly or view it on GitHub https://github.com/golang/go/issues/12524#issuecomment-138128691.
— Reply to this email directly or view it on GitHub https://github.com/golang/go/issues/12524#issuecomment-138134749.
In this specific case, @Rotonen was using the Flynn binary that we distribute as a compiled artifact, it is compiled without cgo to ease cross-compilation. Just because the user is a developer doesn't mean that they are a Go developer or want to compile the binary for themselves. The only question here is if this feature is out of scope for the pure-Go resolver.
cross compilation with cgo-enabled net package is not that hard. You can reuse the package contained in binary distribution and force internal linking.
I don't see anything wrong with supporting the OS X /etc/resolver directory. That said, my understanding is that the Go DNS resolver does not work well on most OS X machines. That is why it is disabled by default.
This would be great in all platforms anyway. Is there any disadvantage from supporting this behaviour? It seems that it'd neatly resolve the need to install and configure dnsmasq to provide the simple function of having different resolvers for different TLDs.
i know this issue is quite old but has there been any traction on this?
any resolution?
Any updates would be posted here. No updates have been posted here.
See resolver(5)
. Just reading the files out of /etc/resolver/* will miss out on other mechanisms for configuring the same thing, for example configuration profiles or IKE attributes.
Just stumbled upon this today while attempting to use coredns as a dns proxy for local development. It's a real bummer to discover how naive our support for os x is.
We've generally assumed people use cgo on Darwin, so this bug has never been a priority.
I do admit that practically means that Darwin binaries need to be built on Darwin, which is difficult for people wanting to cross-compile for a dozen platforms as part of their release process.
Perhaps on Darwin without cgo we could just shell out to a program to do DNS resolution (e.g. host, dig, nslookup?). At least nslookup
has an interactive mode that would permit re-using a child process for multiple lookups, if that proves necessary for performance.
I think reality is most command-line utilities will compile for two platforms: Linux and OS X, and the OS X build will always have cgo disabled. Some subset of the OS X users are using VPN, expect .local names to resolve, or have some other situation where hostname resolution is more than "just query this one DNS server always". Some subset of those users will actually open an issue with the tool, and of those even a smaller subset identify go as the problem and raise an issue here.
So I think you underestimate the impact of the problem.
Shelling out to nslookup
will not fix it. The problem is "doing a DNS query" is not the same thing as "resolving a hostname". Resolving a hostname involves more, such as:
.local
namesTools like host
, nslookup
, and dig
do DNS queries by design, not resolve hostnames. This is equally true on Linux as well as OS X. Unfortunately somehow OS X has acquired some lore about having "two DNS systems", which is simply false. Or at least it was false, until go command-line utilities gained popularity.
If you do want to shell out to a command to perform host resolution, the correct command on OS X is dscacheutil -q host -a name $hostname
. This is analogous to getent hosts $hostname
on Linux.
Another path is to make the go resolver's behavior more consistent with the OS X system resolver. This begins with obtaining resolver configuration from SystemConfiguration.framework or scutil --dns
, not /etc/resolv.conf
.
dscacheutil
sounds good. I was thinking of lookupd
when I wrote the comment above but my local machine didn't have lookupd
so I omitted it. Now I see that dscacheutil
replaced lookupd
.
I don't think we want to get into the business of reimplementing Darwin's name resolution.
@randall77, since you're having fun with macOS lately, any thoughts here? Could we have non-cgo binaries still call into the macOS name resolution code somehow with some assembly/linker goo?
Let's see if we can use the libSystem bindings directly even when cgo is ostensibly disabled.
expect .local names to resolve
I actually expect .local
names to resolve on all platforms per mDNS anyway, if the target responds to the broadcast appropriately.
@bitglue is correct. I think a lot of people are going to file issues against a tool and not raise issues to the Go project. A good example of this is Homebrew. They recently removed support for options in their install which now means people can't install packages written in Go, like Hashicorp's Vault with cgo support. We used to be able to do 'brew install vault --with-dyanmic' to enable cgo support to get correct DNS resolution, but now that is removed and we're stuck with having to hack their install script to get Vault compiled with cgo. It would be nice to see Go's native resolver work in a less naive fashion so we don't need to worry about this issue anymore.
See https://github.com/Homebrew/homebrew-core/issues/33507 for reference.
I would chime in and venture that the root of this issue might be that the net
package treats all Unix systems the same. Perhaps there should be a stubbed out variant for MacOS to deal with it's configd
based resolution?
This issue, as has been noted, will affect every binary not compiled with cgo when users are using VPNs, which would seem to be a common use case.
@rsc Can you provide some detail on how we might be able to call libSystem bindings without cgo?
@grantseltzer The current runtime package is full of examples of calling into libSystem. See runtime/sys_darwin.go.
I'm taking a stab at this, I have a branch on my github fork here: https://github.com/grantseltzer/go but could use some help
The function call i'm looking for is res_search which is in libresolv (/usr/lib/libresolv.9.dylib
)
I have the cgo_import_dynamic directive:
//go:cgo_import_dynamic libresolv_res_search res_search "/usr/lib/libresolv.9.dylib"
The Go function that makes the libcCall call and trampoline (sys_darwin.go
):
//go:nosplit
//go:cgo_unsafe_args
func Res_search(name *byte, class int32, rtype int32, answer *byte, anslen int32) int32 {
return libcCall(unsafe.Pointer(funcPC(res_search_trampoline)), unsafe.Pointer(&name))
}
func res_search_trampoline()
and defined the amd64 assembly routine (sys_darwin_amd64.s
):
TEXT runtime·res_search_trampoline(SB),NOSPLIT,$0
PUSHQ BP
MOVQ SP, BP
MOVL 0(DI), SI // arg 1 name
MOVQ 8(DI), DX // arg 2 class
MOVQ 12(DI), CX // arg 3 type
MOVQ 16(DI), R8 // arg 4 answer
MOVQ 24(DI), R9 // arg 5 anslen
CALL libresolv_res_search(SB)
POPQ BP
RET
When testing the function (which is exported just for testing), I get a return code of -1
and no response in buffer:
func main() {
name := "google.com"
var nameAddr = name[0]
var buffer = [512]byte{}
x := runtime.Res_search(&nameAddr, 255,
255, &buffer[0], 512)
fmt.Println("res_search return code:", x)
fmt.Printf("Buffer: %s\n", buffer)
}
Anything glaring that i'm missing? Perhaps my datatypes or stack offset sizes.
Most importantly, can someone link me to documentation on how to debug the code at this level?
EDIT: more testing/version information:
uname -a
Darwin Grant-SelzterRichman 17.7.0 Darwin Kernel Version 17.7.0: Thu Dec 20 21:47:19 PST 2018; root:xnu-4570.71.22~1/RELEASE_X86_64 x86_64
go version go1.11.5 darwin/amd64
CC @randall77
I believe I was misusing MOVQ vs MOVL (now potentially fixed to this):
TEXT runtime·res_search_trampoline(SB),NOSPLIT,$0
PUSHQ BP
MOVQ SP, BP
MOVQ 0(DI), DI // arg 1 name
MOVL 8(DI), SI // arg 2 class
MOVL 12(DI), DX // arg 3 type
MOVQ 16(DI), CX // arg 4 answer
MOVL 24(DI), R8 // arg 5 anslen
CALL libresolv_res_search(SB)
POPQ BP
RET
Still not there yet though.
I'm stepping through with delve and my hunch is that RDI has not been properly initialized when entering the res_search_trampoline in sys_darwin_amd64.s
When moving from offsets off DI to the respective arg registers the program appears to be blowing away the destination registers instead (pictured below):
Another thing that's confusing me is that when I step into Res_search
(the go function that makes the call to libcCall
) my arguments are unreadable:
Anyone have a hunch of why this call isn't working or have advice on debugging?
Update:
I am getting DNS records using the libresolv res_search
binding with cgo disabled :D!
Working to confirm that this actually honors the /etc/resolver
files, not sure if it is at the moment.
Would still love to hear an explanation for this, but the way I got it working was by changing the order of the arguments being loaded to the order of them listed in the dlv screenshot above:
TEXT runtime·res_search_trampoline(SB),NOSPLIT,$0
PUSHQ BP
MOVQ SP, BP
MOVL (DI), R8 // arg 5 anslen
MOVQ 16(DI), CX // arg 4 answer
MOVL 8(DI), SI // arg 2 class
MOVQ 0(DI), DI // arg 1 name
MOVL 12(DI), DX // arg 3 type
CALL libresolv_res_search(SB)
POPQ BP
RET
Current update: Calling this routine does in fact honor /etc/resolver/
files. I'm currently trying to figure out an issue where the specified query 'type' is not being honored and only AAAA queries are sent.
My questions for once I fix that and prepare it for a CL:
1) Should this routine be defined for all of i386, x86_64, ARM, and ARM64? 2) What testing mechanisms exist for code at this level beyond manually? 3) Should the cgo bindings exist in runtime or are they appropriate for the net package?
Opened #30686
Change https://golang.org/cl/166297 mentions this issue: net: Use libSystem bindings for DNS resolution on macos if CGO is unavailable
If we want to accommodate several DNS stub resolver implementations, typically it would be as follows:
However, I'm still not sure we really need to hold all of the implementations in the package net. Is there any specific reason not making a new API that accepts external stub resolver implementations? Once we open up the API, we are also able to use the API for upcoming fancy technologies such as DoH (DNS over HTTPS).
TEXT runtime·res_search_trampoline(SB),NOSPLIT,$0
PUSHQ BP
MOVQ SP, BP
MOVL (DI), R8 // arg 5 anslen
MOVQ 16(DI), CX // arg 4 answer
MOVL 8(DI), SI // arg 2 class
MOVQ 0(DI), DI // arg 1 name
MOVL 12(DI), DX // arg 3 type
CALL libresolv_res_search(SB)
POPQ BP
RET
The last MOVL
is using a DI
value that just got clobbered in the previous instruction. You have to load DI
last.
The manpage is unclear about what the return value of res_search is. You might need to call libc_error
if the return value is <0 to get an actual error code. See mmap for an example.
Debugging this stuff is hard generally. Sorry about that. It does seem that you're making progress though.
By the way, if Darwin supports res_nsearch
and friends, we should probably use them, as they are thread-safe. res_search
and res_nsearch
normally return the length of the response and I assume the same is true on Darwin.
@randall77 Ah that makes a lot of sense, thank you! I pushed changes including the error checks (they return size of response, unless error which is -1)
@ianlancetaylor I have been working on this today, as well as changing the GODEBUG/CGO set logic discussed on gerrit.
res_nsearch
is supported.
In order to use res_nsearch
we would have to use res_ninit
. I don't know whether res_search
would also work OK, but it's troubling that it's not considered to be thread-safe on GNU/Linux. I don't know about Darwin. I don't know when the global variable is modified.
But I guess that to use res_ninit
and res_nsearch
we would need to at least know the size of res_state
. Probably the best approach would be to double-check that on Darwin res_state
is <= 512 bytes, as I expect it is, and then use [64]uint64
.
Change https://golang.org/cl/180842 mentions this issue: net: fix non-cgo macOS resolver code
Given that we already use the C library with cgo-based macOS builds (the default) and that we in fact prefer cgoLookupHost to doing it ourselves, it seems like Go should support /etc/resolver just fine out of the box.
CL 166297 (f6b42a5) added some code for the non-cgo builds, but (1) it doesn't work and (2) it's unclear that the non-cgo builds really need attention to this corner case.
I sent CL 180843 to revert the recent changes, but I am inclined to leave this bug closed, since again the cgo path should be handling /etc/resolver just fine.
It seems odd that a deprecated mechanism on macOS would be the best alternative. Perhaps a native builder for Darwin would fix the upstream issues?
it's unclear that the non-cgo builds really need attention to this corner case.
This is an actual issue for us, as it presumably is for the original reporter as well as others who've chimed in on this thread.
This is an actual issue for us, as it presumably is for the original reporter
I've circumvented this since 2015 by not using Flynn, and thus not needing the functionality in a non-cgo Go on Darwin. That decision had nothing to do with this issue: I circumvented by beefing up my machine-internal infrastructure stack so I did not have to rely on the macOS /etc/resolver/*
mechanism for rolling my own private TLD.
I've not encountered any software, Go or otherwise, since then, which would not work on Darwin with my machine-local VM cluster and networking setup using the /etc/resolver/*
mechanism. I still use the machine-local infra stack for evaluating new infrastructure stacks from time to time. The circumstances for ending up in this corner are fairly specific - when putting on the systems consultant hat from time to time, I am a lone wolf for whom everything needs to work laptop-internally for being able to go and showcase things trivially.
@cespare perhaps the real solution for you would be to bake the contextual resolver dynamicity into your corporate networking infrastructure and not try to do a full OSI layer cake in-machine. Just roll your own network-internal TLD root.
Or figure out cross compilation - it is less scary than it sounds like.
@rsc While it's true that that CGO is enabled by default in the compiler, it is consistently disabled by the maintainers of flagship Golang applications:
https://github.com/kubernetes/kubernetes/blob/v1.14.2/hack/lib/golang.sh#L377-L410 https://github.com/hashicorp/consul/blob/v1.5.1/build-support/functions/20-build.sh#L456 https://github.com/hashicorp/terraform/blob/v0.12.1/scripts/build.sh#L43 https://github.com/hashicorp/vault/blob/v1.1.3/Makefile#L18
This means, for all these tools, when run on an OS X machine, their DNS resolving is broken beyond the happy path of vanilla DNS - which from my personal experience, while admittedly short, has never been the case in an enterprise workplace, primarily due to VPNs.
It also means that whenever someone discovers that DNS is broken in any Golang tool they use, they'll eventually discover this thread, where we effectively told them to go pound sand, and take the issue up with the maintainer(s) of their tool as to why it was too difficult for them to build and release their tool with the defaults turned on.
As to what the challenges are with leaving CGO enabled, I would encourage you to take a look at the issues @caarlos0 has had while trying to support CGO in the fabulous build tool GoReleaser:
https://github.com/goreleaser/goreleaser/issues/708
Additionally, the folks at Homebrew, a commonly used package manager for OS X, are making it increasingly more difficult to even support the option for maintainers to support installation flags, further increasing the issue:
https://github.com/Homebrew/homebrew-core/issues/33507
Ultimately, it's up to you @rsc and the example you want to lead with. It's without a doubt beyond obnoxious that so many systems have failed to get to the point where we have to even consider solving the problem in the language - nevertheless, here we are looking for a hero.
Add our non-corner case problem with this https://github.com/vapor-ware/ksync/issues/260
@rsc Between the comments above, ones on the original issue, and through speaking to people in person and on slack I know a lot of people/orgs could really use your fixed change set. Kubectl, helm, vpn services, hashicorp tools, and many many others are affected by the lack of this feature.
What would you need to see to overturn your decision?
As someone who works on Terraform, I just wanted to add a note here about specifically why Terraform (and, I expect, at least some of the other applications listed by @flyinprogrammer above) is built the way it is:
Terraform is a CLI program targeting many different operating systems and architectures. In order to keep the build and release process manageable, we rely totally on cross-compilation to build for all of those targets.
While in principle it is possible to persuade CGo to link with the C libraries of another platform when cross-compiling, it seems that the license of the the MacOS libraries forbids using them in that way; they are licensed only for linking on a MacOS system. This leaves usage of solutions like "xgo" in an ambiguous legal position.
That licensing situation seems to leave only two choices: build directly on a MacOS system, or use cross-compilation with CGo disabled. Perhaps that's just how it has to be, but we sure had been hoping for some sort of upstream answer here to play better with Go's otherwise-excellent cross-compilation support.
An example of another possible technical solution (though I don't know the license-compliance of it): I've seen examples of folks linking the precompiled .a
files in pkg/darwin_amd64
in the official darwin binary distribution, which I gather works because those were built with CGo enabled and thus already have the necessary C library calls compiled into them. If that approach feels long-term sustainable (that is, if the Go team is willing to document it as more than an implementation detail / hack), and ideally if there could be some first-party tooling for obtaining and installing those files, that would meet Terraform's needs because Terraform intentionally uses no third-party CGo dependencies.
I've seen examples of folks linking the precompiled .a files in pkg/darwin_amd64 in the official darwin binary distribution, which I gather works because those were built with CGo enabled and thus already have the necessary C library calls compiled into them. If that approach feels long-term sustainable (that is, if the Go team is willing to document it as more than an implementation detail / hack), and ideally if there could be some first-party tooling for obtaining and installing those files, that would meet Terraform's needs because Terraform intentionally uses no third-party CGo dependencies.
I'm sorry: that is definitely an implementation detail / hack.
It's pretty hard for us to extend Go's cross-compilation support to support using cgo. I don't know what restrictions Apple puts on the Darwin libraries. I don't see any clear fixes here. I'm open to suggestions.
@ianlancetaylor
I don't see any clear fixes here. I'm open to suggestions.
Isn't the fix to redo f6b42a53e5ac1f1c3f3b1c9ed2407e68e0b637a0 but more correctly?
This is an ongoing annoyance for us. We have internal tools that make network calls and they don't work if they are cross-compiled and the user is using WireGuard on macOS because DNS resolution fails. So we have to build those tools on macOS machines, unlike all the other tools where cross-compiling is fine.
Ah, OK, let's do that. Thanks.
Reopening because this does seem to still be a problem for some people.
Thanks for re-opening! Trying to use the stripe
CLI tool and it won't work w/ .test
domains.
Hi all, I've created a library that fixes this for our CLI users: https://github.com/JohnStarich/go/tree/master/dns It's not a great solution, but it does work for now.
Is there anything I can do to help? I see the revert mentioned above as a possible starting point, were there any other required changes to move forward?
@JohnStarich do you also see issues with DNS resolver on Mac OS? In my case without CGO enabled the build on Linux for Darwin results in a broken DNS resolution on Darwin. I resorted to use MacOS machine to build the mac binaries, but I would rather want to have the DNS story fixed for macs
https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man5/resolver.5.html
OS X allows you to add TLD specific resolver configurations. Quite popular ones are /etc/resolver/vm for local virtual machines and /etc/resolver/dev for local development purposes.
https://golang.org/src/net/dnsclient_unix.go#L231
Go seems to be hardcoded to only take /etc/resolv.conf into account on Unix platforms.