golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.77k stars 17.51k forks source link

net: Support the /etc/resolver DNS resolution configuration hierarchy on OS X when cgo is disabled #12524

Closed Rotonen closed 1 year ago

Rotonen commented 9 years ago

https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man5/resolver.5.html

OS X allows you to add TLD specific resolver configurations. Quite popular ones are /etc/resolver/vm for local virtual machines and /etc/resolver/dev for local development purposes.

https://golang.org/src/net/dnsclient_unix.go#L231

Go seems to be hardcoded to only take /etc/resolv.conf into account on Unix platforms.

nodirt commented 9 years ago

I don't think Go-native DNS resolving mechanism is used on Mac. https://golang.org/src/net/dnsclient_unix.go#L231 is not executed if I run

addrs, err := net.LookupHost("google.com")

on my Mac.

If I enable debugging (GODEBUG=netdns=2 go run test.go), the following is printed:

go package net: using cgo DNS resolver
go package net: hostLookupOrder(google.com) = cgo

which means that OS-native DNS resolving is used.

Can you supply an exact configuration file, Go code, actual and expected output?

titanous commented 9 years ago

@nodirt This is for a binary with cgo off.

davecheney commented 9 years ago

If cgo is disabled then the pure go DNS resolver will be used. If you want to use the Mac DNS resolver, plese build with cgo.

On Mon, 7 Sep 2015 07:47 Jonathan Rudenberg notifications@github.com wrote:

@nodirt https://github.com/nodirt This is for a binary with cgo off.

— Reply to this email directly or view it on GitHub https://github.com/golang/go/issues/12524#issuecomment-138128691.

nodirt commented 9 years ago

Shouldn't be a problem since this is needed only on a dev machine.

On Sun, Sep 6, 2015 at 4:06 PM Dave Cheney notifications@github.com wrote:

If cgo is disabled then the pure go DNS resolver will be used. If you want to use the Mac DNS resolver, plese build with cgo.

On Mon, 7 Sep 2015 07:47 Jonathan Rudenberg notifications@github.com wrote:

@nodirt https://github.com/nodirt This is for a binary with cgo off.

— Reply to this email directly or view it on GitHub https://github.com/golang/go/issues/12524#issuecomment-138128691.

— Reply to this email directly or view it on GitHub https://github.com/golang/go/issues/12524#issuecomment-138134749.

titanous commented 9 years ago

In this specific case, @Rotonen was using the Flynn binary that we distribute as a compiled artifact, it is compiled without cgo to ease cross-compilation. Just because the user is a developer doesn't mean that they are a Go developer or want to compile the binary for themselves. The only question here is if this feature is out of scope for the pure-Go resolver.

minux commented 9 years ago

cross compilation with cgo-enabled net package is not that hard. You can reuse the package contained in binary distribution and force internal linking.

ianlancetaylor commented 9 years ago

I don't see anything wrong with supporting the OS X /etc/resolver directory. That said, my understanding is that the Go DNS resolver does not work well on most OS X machines. That is why it is disabled by default.

mterron commented 8 years ago

This would be great in all platforms anyway. Is there any disadvantage from supporting this behaviour? It seems that it'd neatly resolve the need to install and configure dnsmasq to provide the simple function of having different resolvers for different TLDs.

jason-riddle commented 7 years ago

i know this issue is quite old but has there been any traction on this?

ghost commented 7 years ago

any resolution?

bradfitz commented 7 years ago

Any updates would be posted here. No updates have been posted here.

bitglue commented 6 years ago

See resolver(5). Just reading the files out of /etc/resolver/* will miss out on other mechanisms for configuring the same thing, for example configuration profiles or IKE attributes.

flyinprogrammer commented 5 years ago

Just stumbled upon this today while attempting to use coredns as a dns proxy for local development. It's a real bummer to discover how naive our support for os x is.

bradfitz commented 5 years ago

We've generally assumed people use cgo on Darwin, so this bug has never been a priority.

I do admit that practically means that Darwin binaries need to be built on Darwin, which is difficult for people wanting to cross-compile for a dozen platforms as part of their release process.

Perhaps on Darwin without cgo we could just shell out to a program to do DNS resolution (e.g. host, dig, nslookup?). At least nslookup has an interactive mode that would permit re-using a child process for multiple lookups, if that proves necessary for performance.

bitglue commented 5 years ago

I think reality is most command-line utilities will compile for two platforms: Linux and OS X, and the OS X build will always have cgo disabled. Some subset of the OS X users are using VPN, expect .local names to resolve, or have some other situation where hostname resolution is more than "just query this one DNS server always". Some subset of those users will actually open an issue with the tool, and of those even a smaller subset identify go as the problem and raise an issue here.

So I think you underestimate the impact of the problem.

Shelling out to nslookup will not fix it. The problem is "doing a DNS query" is not the same thing as "resolving a hostname". Resolving a hostname involves more, such as:

Tools like host, nslookup, and dig do DNS queries by design, not resolve hostnames. This is equally true on Linux as well as OS X. Unfortunately somehow OS X has acquired some lore about having "two DNS systems", which is simply false. Or at least it was false, until go command-line utilities gained popularity.

If you do want to shell out to a command to perform host resolution, the correct command on OS X is dscacheutil -q host -a name $hostname. This is analogous to getent hosts $hostname on Linux.

Another path is to make the go resolver's behavior more consistent with the OS X system resolver. This begins with obtaining resolver configuration from SystemConfiguration.framework or scutil --dns, not /etc/resolv.conf.

bradfitz commented 5 years ago

dscacheutil sounds good. I was thinking of lookupd when I wrote the comment above but my local machine didn't have lookupd so I omitted it. Now I see that dscacheutil replaced lookupd.

I don't think we want to get into the business of reimplementing Darwin's name resolution.

@randall77, since you're having fun with macOS lately, any thoughts here? Could we have non-cgo binaries still call into the macOS name resolution code somehow with some assembly/linker goo?

rsc commented 5 years ago

Let's see if we can use the libSystem bindings directly even when cgo is ostensibly disabled.

Rotonen commented 5 years ago

expect .local names to resolve

I actually expect .local names to resolve on all platforms per mDNS anyway, if the target responds to the broadcast appropriately.

nordicmachine commented 5 years ago

@bitglue is correct. I think a lot of people are going to file issues against a tool and not raise issues to the Go project. A good example of this is Homebrew. They recently removed support for options in their install which now means people can't install packages written in Go, like Hashicorp's Vault with cgo support. We used to be able to do 'brew install vault --with-dyanmic' to enable cgo support to get correct DNS resolution, but now that is removed and we're stuck with having to hack their install script to get Vault compiled with cgo. It would be nice to see Go's native resolver work in a less naive fashion so we don't need to worry about this issue anymore.

See https://github.com/Homebrew/homebrew-core/issues/33507 for reference.

timfallmk commented 5 years ago

I would chime in and venture that the root of this issue might be that the net package treats all Unix systems the same. Perhaps there should be a stubbed out variant for MacOS to deal with it's configd based resolution?

This issue, as has been noted, will affect every binary not compiled with cgo when users are using VPNs, which would seem to be a common use case.

grantseltzer commented 5 years ago

@rsc Can you provide some detail on how we might be able to call libSystem bindings without cgo?

ianlancetaylor commented 5 years ago

@grantseltzer The current runtime package is full of examples of calling into libSystem. See runtime/sys_darwin.go.

grantseltzer commented 5 years ago

I'm taking a stab at this, I have a branch on my github fork here: https://github.com/grantseltzer/go but could use some help

The function call i'm looking for is res_search which is in libresolv (/usr/lib/libresolv.9.dylib)

I have the cgo_import_dynamic directive:

//go:cgo_import_dynamic libresolv_res_search res_search "/usr/lib/libresolv.9.dylib"

The Go function that makes the libcCall call and trampoline (sys_darwin.go):

//go:nosplit
//go:cgo_unsafe_args
func Res_search(name *byte, class int32, rtype int32, answer *byte, anslen int32) int32 {
    return libcCall(unsafe.Pointer(funcPC(res_search_trampoline)), unsafe.Pointer(&name))
}
func res_search_trampoline()

and defined the amd64 assembly routine (sys_darwin_amd64.s):

TEXT runtime·res_search_trampoline(SB),NOSPLIT,$0
    PUSHQ   BP
    MOVQ    SP, BP
    MOVL    0(DI), SI       // arg 1 name
    MOVQ    8(DI), DX       // arg 2 class
    MOVQ    12(DI), CX      // arg 3 type
    MOVQ    16(DI), R8      // arg 4 answer
    MOVQ    24(DI), R9      // arg 5 anslen
    CALL    libresolv_res_search(SB)
    POPQ    BP
    RET

When testing the function (which is exported just for testing), I get a return code of -1 and no response in buffer:


func main() {

    name := "google.com"
    var nameAddr = name[0]

    var buffer = [512]byte{}

    x := runtime.Res_search(&nameAddr, 255,
        255, &buffer[0], 512)

    fmt.Println("res_search return code:", x)
    fmt.Printf("Buffer: %s\n", buffer)
}

Anything glaring that i'm missing? Perhaps my datatypes or stack offset sizes.

Most importantly, can someone link me to documentation on how to debug the code at this level?

EDIT: more testing/version information:

uname -a

Darwin Grant-SelzterRichman 17.7.0 Darwin Kernel Version 17.7.0: Thu Dec 20 21:47:19 PST 2018; root:xnu-4570.71.22~1/RELEASE_X86_64 x86_64
go version go1.11.5 darwin/amd64
ianlancetaylor commented 5 years ago

CC @randall77

grantseltzer commented 5 years ago

I believe I was misusing MOVQ vs MOVL (now potentially fixed to this):

TEXT runtime·res_search_trampoline(SB),NOSPLIT,$0
    PUSHQ   BP
    MOVQ    SP, BP
    MOVQ    0(DI), DI       // arg 1 name
    MOVL    8(DI), SI       // arg 2 class
    MOVL    12(DI), DX      // arg 3 type
    MOVQ    16(DI), CX      // arg 4 answer
    MOVL    24(DI), R8      // arg 5 anslen
    CALL    libresolv_res_search(SB)
    POPQ    BP
    RET

Still not there yet though.

I'm stepping through with delve and my hunch is that RDI has not been properly initialized when entering the res_search_trampoline in sys_darwin_amd64.s

When moving from offsets off DI to the respective arg registers the program appears to be blowing away the destination registers instead (pictured below):

debugger-blowing-away-regs

Another thing that's confusing me is that when I step into Res_search (the go function that makes the call to libcCall) my arguments are unreadable:

screen shot 2019-02-25 at 3 00 45 pm

Anyone have a hunch of why this call isn't working or have advice on debugging?

grantseltzer commented 5 years ago

Update:

I am getting DNS records using the libresolv res_search binding with cgo disabled :D!

Working to confirm that this actually honors the /etc/resolver files, not sure if it is at the moment.

screen shot 2019-02-27 at 4 01 36 pm

Would still love to hear an explanation for this, but the way I got it working was by changing the order of the arguments being loaded to the order of them listed in the dlv screenshot above:

TEXT runtime·res_search_trampoline(SB),NOSPLIT,$0
    PUSHQ   BP
    MOVQ    SP, BP
    MOVL    (DI), R8        // arg 5 anslen
    MOVQ    16(DI), CX      // arg 4 answer
    MOVL    8(DI), SI       // arg 2 class
    MOVQ    0(DI), DI       // arg 1 name
    MOVL    12(DI), DX      // arg 3 type
    CALL    libresolv_res_search(SB)
    POPQ    BP
    RET
grantseltzer commented 5 years ago

Current update: Calling this routine does in fact honor /etc/resolver/ files. I'm currently trying to figure out an issue where the specified query 'type' is not being honored and only AAAA queries are sent.

My questions for once I fix that and prepare it for a CL:

1) Should this routine be defined for all of i386, x86_64, ARM, and ARM64? 2) What testing mechanisms exist for code at this level beyond manually? 3) Should the cgo bindings exist in runtime or are they appropriate for the net package?

grantseltzer commented 5 years ago

Opened #30686

gopherbot commented 5 years ago

Change https://golang.org/cl/166297 mentions this issue: net: Use libSystem bindings for DNS resolution on macos if CGO is unavailable

mikioh commented 5 years ago

If we want to accommodate several DNS stub resolver implementations, typically it would be as follows:

However, I'm still not sure we really need to hold all of the implementations in the package net. Is there any specific reason not making a new API that accepts external stub resolver implementations? Once we open up the API, we are also able to use the API for upcoming fancy technologies such as DoH (DNS over HTTPS).

randall77 commented 5 years ago
TEXT runtime·res_search_trampoline(SB),NOSPLIT,$0
    PUSHQ   BP
    MOVQ    SP, BP
    MOVL    (DI), R8        // arg 5 anslen
    MOVQ    16(DI), CX      // arg 4 answer
    MOVL    8(DI), SI       // arg 2 class
    MOVQ    0(DI), DI       // arg 1 name
    MOVL    12(DI), DX      // arg 3 type
    CALL    libresolv_res_search(SB)
    POPQ    BP
    RET

The last MOVL is using a DI value that just got clobbered in the previous instruction. You have to load DI last. The manpage is unclear about what the return value of res_search is. You might need to call libc_error if the return value is <0 to get an actual error code. See mmap for an example.

Debugging this stuff is hard generally. Sorry about that. It does seem that you're making progress though.

ianlancetaylor commented 5 years ago

By the way, if Darwin supports res_nsearch and friends, we should probably use them, as they are thread-safe. res_search and res_nsearch normally return the length of the response and I assume the same is true on Darwin.

grantseltzer commented 5 years ago

@randall77 Ah that makes a lot of sense, thank you! I pushed changes including the error checks (they return size of response, unless error which is -1)

@ianlancetaylor I have been working on this today, as well as changing the GODEBUG/CGO set logic discussed on gerrit.

res_nsearch is supported.

ianlancetaylor commented 5 years ago

In order to use res_nsearch we would have to use res_ninit. I don't know whether res_search would also work OK, but it's troubling that it's not considered to be thread-safe on GNU/Linux. I don't know about Darwin. I don't know when the global variable is modified.

But I guess that to use res_ninit and res_nsearch we would need to at least know the size of res_state. Probably the best approach would be to double-check that on Darwin res_state is <= 512 bytes, as I expect it is, and then use [64]uint64.

gopherbot commented 5 years ago

Change https://golang.org/cl/180842 mentions this issue: net: fix non-cgo macOS resolver code

rsc commented 5 years ago

Given that we already use the C library with cgo-based macOS builds (the default) and that we in fact prefer cgoLookupHost to doing it ourselves, it seems like Go should support /etc/resolver just fine out of the box.

CL 166297 (f6b42a5) added some code for the non-cgo builds, but (1) it doesn't work and (2) it's unclear that the non-cgo builds really need attention to this corner case.

I sent CL 180843 to revert the recent changes, but I am inclined to leave this bug closed, since again the cgo path should be handling /etc/resolver just fine.

timfallmk commented 5 years ago

It seems odd that a deprecated mechanism on macOS would be the best alternative. Perhaps a native builder for Darwin would fix the upstream issues?

cespare commented 5 years ago

it's unclear that the non-cgo builds really need attention to this corner case.

This is an actual issue for us, as it presumably is for the original reporter as well as others who've chimed in on this thread.

Rotonen commented 5 years ago

This is an actual issue for us, as it presumably is for the original reporter

I've circumvented this since 2015 by not using Flynn, and thus not needing the functionality in a non-cgo Go on Darwin. That decision had nothing to do with this issue: I circumvented by beefing up my machine-internal infrastructure stack so I did not have to rely on the macOS /etc/resolver/* mechanism for rolling my own private TLD.

I've not encountered any software, Go or otherwise, since then, which would not work on Darwin with my machine-local VM cluster and networking setup using the /etc/resolver/* mechanism. I still use the machine-local infra stack for evaluating new infrastructure stacks from time to time. The circumstances for ending up in this corner are fairly specific - when putting on the systems consultant hat from time to time, I am a lone wolf for whom everything needs to work laptop-internally for being able to go and showcase things trivially.

@cespare perhaps the real solution for you would be to bake the contextual resolver dynamicity into your corporate networking infrastructure and not try to do a full OSI layer cake in-machine. Just roll your own network-internal TLD root.

Or figure out cross compilation - it is less scary than it sounds like.

flyinprogrammer commented 5 years ago

@rsc While it's true that that CGO is enabled by default in the compiler, it is consistently disabled by the maintainers of flagship Golang applications:

https://github.com/kubernetes/kubernetes/blob/v1.14.2/hack/lib/golang.sh#L377-L410 https://github.com/hashicorp/consul/blob/v1.5.1/build-support/functions/20-build.sh#L456 https://github.com/hashicorp/terraform/blob/v0.12.1/scripts/build.sh#L43 https://github.com/hashicorp/vault/blob/v1.1.3/Makefile#L18

This means, for all these tools, when run on an OS X machine, their DNS resolving is broken beyond the happy path of vanilla DNS - which from my personal experience, while admittedly short, has never been the case in an enterprise workplace, primarily due to VPNs.

It also means that whenever someone discovers that DNS is broken in any Golang tool they use, they'll eventually discover this thread, where we effectively told them to go pound sand, and take the issue up with the maintainer(s) of their tool as to why it was too difficult for them to build and release their tool with the defaults turned on.

As to what the challenges are with leaving CGO enabled, I would encourage you to take a look at the issues @caarlos0 has had while trying to support CGO in the fabulous build tool GoReleaser:

https://github.com/goreleaser/goreleaser/issues/708

Additionally, the folks at Homebrew, a commonly used package manager for OS X, are making it increasingly more difficult to even support the option for maintainers to support installation flags, further increasing the issue:

https://github.com/Homebrew/homebrew-core/issues/33507

Ultimately, it's up to you @rsc and the example you want to lead with. It's without a doubt beyond obnoxious that so many systems have failed to get to the point where we have to even consider solving the problem in the language - nevertheless, here we are looking for a hero.

timfallmk commented 5 years ago

Add our non-corner case problem with this https://github.com/vapor-ware/ksync/issues/260

grantseltzer commented 5 years ago

@rsc Between the comments above, ones on the original issue, and through speaking to people in person and on slack I know a lot of people/orgs could really use your fixed change set. Kubectl, helm, vpn services, hashicorp tools, and many many others are affected by the lack of this feature.

What would you need to see to overturn your decision?

apparentlymart commented 5 years ago

As someone who works on Terraform, I just wanted to add a note here about specifically why Terraform (and, I expect, at least some of the other applications listed by @flyinprogrammer above) is built the way it is:

Terraform is a CLI program targeting many different operating systems and architectures. In order to keep the build and release process manageable, we rely totally on cross-compilation to build for all of those targets.

While in principle it is possible to persuade CGo to link with the C libraries of another platform when cross-compiling, it seems that the license of the the MacOS libraries forbids using them in that way; they are licensed only for linking on a MacOS system. This leaves usage of solutions like "xgo" in an ambiguous legal position.

That licensing situation seems to leave only two choices: build directly on a MacOS system, or use cross-compilation with CGo disabled. Perhaps that's just how it has to be, but we sure had been hoping for some sort of upstream answer here to play better with Go's otherwise-excellent cross-compilation support.

An example of another possible technical solution (though I don't know the license-compliance of it): I've seen examples of folks linking the precompiled .a files in pkg/darwin_amd64 in the official darwin binary distribution, which I gather works because those were built with CGo enabled and thus already have the necessary C library calls compiled into them. If that approach feels long-term sustainable (that is, if the Go team is willing to document it as more than an implementation detail / hack), and ideally if there could be some first-party tooling for obtaining and installing those files, that would meet Terraform's needs because Terraform intentionally uses no third-party CGo dependencies.

ianlancetaylor commented 5 years ago

I've seen examples of folks linking the precompiled .a files in pkg/darwin_amd64 in the official darwin binary distribution, which I gather works because those were built with CGo enabled and thus already have the necessary C library calls compiled into them. If that approach feels long-term sustainable (that is, if the Go team is willing to document it as more than an implementation detail / hack), and ideally if there could be some first-party tooling for obtaining and installing those files, that would meet Terraform's needs because Terraform intentionally uses no third-party CGo dependencies.

I'm sorry: that is definitely an implementation detail / hack.

It's pretty hard for us to extend Go's cross-compilation support to support using cgo. I don't know what restrictions Apple puts on the Darwin libraries. I don't see any clear fixes here. I'm open to suggestions.

cespare commented 4 years ago

@ianlancetaylor

I don't see any clear fixes here. I'm open to suggestions.

Isn't the fix to redo f6b42a53e5ac1f1c3f3b1c9ed2407e68e0b637a0 but more correctly?

This is an ongoing annoyance for us. We have internal tools that make network calls and they don't work if they are cross-compiled and the user is using WireGuard on macOS because DNS resolution fails. So we have to build those tools on macOS machines, unlike all the other tools where cross-compiling is fine.

ianlancetaylor commented 4 years ago

Ah, OK, let's do that. Thanks.

Reopening because this does seem to still be a problem for some people.

edalzell commented 4 years ago

Thanks for re-opening! Trying to use the stripe CLI tool and it won't work w/ .test domains.

grantseltzer commented 4 years ago

@ianlancetaylor

I don't see any clear fixes here. I'm open to suggestions.

Isn't the fix to redo f6b42a5 but more correctly?

Note that @rsc corrected the changes of f6b42a5 in CL 180842

Resolving this issue would start with a revert of CL 180843

JohnStarich commented 4 years ago

Hi all, I've created a library that fixes this for our CLI users: https://github.com/JohnStarich/go/tree/master/dns It's not a great solution, but it does work for now.

Is there anything I can do to help? I see the revert mentioned above as a possible starting point, were there any other required changes to move forward?

hellt commented 4 years ago

@JohnStarich do you also see issues with DNS resolver on Mac OS? In my case without CGO enabled the build on Linux for Darwin results in a broken DNS resolution on Darwin. I resorted to use MacOS machine to build the mac binaries, but I would rather want to have the DNS story fixed for macs