golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.52k stars 17.6k forks source link

net: make LookupCNAME consistent between Unix and Windows, document #50101

Closed bradfitz closed 1 year ago

bradfitz commented 2 years ago

LookupCNAME is pretty weird right now.

Despite the name, it entirely ignores CNAME records on Unix. It launches A and AAAA record lookups to recursive resolvers and returns the first response name found in the A and AAAA, skipping over any CNAME. (and not even asking for a CNAME)

But it documents that it does that...

https://pkg.go.dev/net#LookupCNAME

A canonical name is the final name after following zero or more CNAME records. LookupCNAME does not return an error if host does not contain DNS "CNAME" records, as long as host resolves to address records.

OTOH, on Windows, it does what you would expect from the name itself: it looks up CNAME records:

func (*Resolver) lookupCNAME(ctx context.Context, name string) (string, error) {
        // TODO(bradfitz): finish ctx plumbing. Nothing currently depends on this.
        acquireThread()
        defer releaseThread()
        var r *syscall.DNSRecord
        e := syscall.DnsQuery(name, syscall.DNS_TYPE_CNAME, 0, nil, &r, nil)

Here's a demo of a program behaving differently:

func main() {
    txt, err := net.LookupTXT("cname-to-txt.go4.org")
    log.Printf("LookupTXT = %q, %v", txt, err)

    cname, err := net.LookupCNAME("cname-to-txt.go4.org")
    log.Printf("cname = %q, %v", cname, err)
}

On Linux/Mac:

2021/12/10 21:19:45 LookupTXT = ["foo=bar"], <nil>
2021/12/10 21:19:45 cname = "", lookup cname-to-txt.go4.org: no such host

On Windows:

2021/12/10 21:11:45 LookupTXT = ["foo=bar"], <nil>
2021/12/10 21:11:45 cname = "test-txt-record.go4.org.", <nil>

I like the Windows behavior better, FWIW. That's what I was looking for, but apparently it doesn't exist.

Can we either:

  1. add LookupCNAMERecord that actually looks up a CNAME record
  2. redefine LookupCNAME to be like Windows, perhaps adding a LookupCanonicalName with the current weird Unix behavior of LookupCNAME?

But at minimum: document whatever the rules are and make Unix and Windows match? At least in Resolver.PreferGo mode?

rsc commented 2 years ago

I think LookupCNAME came about that way because we are using getaddrinfo, and it returns the underlying name almost as a side effect of the lookup, in res.ai_canonname. Should we stop using glibc for this call and make it match Windows? It might make it fail where it was succeeding before? Not sure.

rsc commented 2 years ago

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

rsc commented 2 years ago

The problem we have is that we need to use getaddrinfo(AI_CANONNAME) on Macs, because port 53 is blocked to non-libc code. Is there some way to make getaddrinfo succeed for hosts that have a CNAME record but for which that named host has no A/AAAA records? If not, it's very hard to implement the Windows LookupCNAME behavior on systems like Macs.

rsc commented 2 years ago

Looks like even though it's not in the man pages, macOS may have res_ninit. So maybe we should look into using that in place of getaddrinfo(AI_CANONNAME). If that's possible, then it would seem OK to change this.

The specific case being changed is when the name has a CNAME but no A/AAAA record, which is currently an error on Unix but succeeds on Windows. With this change, it would succeed everywhere in this (unusual) case.

Does anyone want to look into how hard it would be to make the Go code use libresolv on Mac? That might also help for things like MX lookups.

rsc commented 2 years ago

Sounds like we are still waiting for someone to check what can be done on the Mac.

rsc commented 2 years ago

@bradfitz says there is a CL in net that got rolled back that used libresolv directly. The only problem was it used res_init instead of res_ninit, because the latter is undocumented (but apparently present). The relevant issue is #12524. Perhaps someone wants to try resurrecting that CL using the thread-safe APIs? Also related: #16345 and #31705. The rollback was https://go.dev/cl/180843.

rsc commented 2 years ago

On hold for anyone who wants to try to implement the new behavior on Mac.

rsc commented 2 years ago

Placed on hold. — rsc for the proposal review group

rsc commented 2 years ago

Hacked up something that seems to work on macOS. I will clean it up and mail it out next week.

rsc commented 2 years ago

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

rsc commented 2 years ago

I haven't cleaned it up yet, but it certainly seems to work and it passes Brad's test TXT record.

Does anyone object to changing LookupCNAME to succeed for hosts with CNAME but not A records on Unix, like it already does on Windows?

slrz commented 2 years ago

I've seen it used to get the local machine's FQDN, by passing the result of os.Hostname to LookupCNAME and letting the resolver worry about search domains and what not. That seems to be exactly the kind of thing that relies on LookupCNAME essentially behaving like gai's AI_CANONNAME.

slrz commented 2 years ago

A quick search surfaces the package github.com/Showmax/go-fqdn which does just that. It falls back to other methods if LookupCNAME doesn't work but those rely on properly configured PTR records (or explicit /etc/hosts entries).

If we changed LookupCNAME to be about CNAME records, I would prefer to have the "canonical name" behaviour available through some other way.

rsc commented 2 years ago

Sorry, I don't understand. I had not realized AI_CANONNAME meant anything other than 'do a CNAME lookup'.

What is the circumstance in which AI_CANONNAME consults something other than CNAME records, and what does it consult?

slrz commented 2 years ago

Invoking getaddrinfo for the name "host" with AI_CANONNAME in ai_flags and an otherwise zero hints structure (meaning AF_UNSPEC) does the following (at least on glibc):

That's with "files dns" listed in nsswitch.conf. It's anyone's guess what it might do with other NSS modules.

The Linux man pages describe this behaviour as:

If hints.ai_flags includes the AI_CANONNAME flag, then the ai_canonname field of the first of the addrinfo structures in the returned list is set to point to the official name of the host.

I guess "official name" is meant to refer to the host's FQDN here.

rsc commented 2 years ago

@slrz OK, it sounds like the difference is for hosts with A/AAAA records but no CNAME, such as 'google.com'. So LookupCNAME("google.com") = "google.com", nil, at least on Unix. But probably on Windows LookupCNAME("google.com") is an error because it actually looks for CNAME records. Is that the behavior difference you were getting at?

Normally we try very hard to avoid behavior changes, but in this case it is hard to see what would break given that Windows has never behaved the way Unix does, and we'd be making the Windows behavior the standard one.

rsc commented 2 years ago

There are two possible options.

  1. Make LookupCNAME everywhere match LookupCNAME on Windows today, which means it looks for CNAME records and that's it.
  2. Make LookupCNAME match a combination of Unix and Windows, where it looks for CNAME records but also falls back to returning the input string (like "google.com") for names that have A/AAAA records and no CNAME.

Given the rest of the API, with things like LookupMX, I think everyone expects LookupCNAME to mean (1) today. I certainly did. It seems like we should at least try to go down that path, and if we find out why we have to do (2), at least we'll know why.

Maybe we could get the code for (1) ready to land at the start of the Go 1.20 cycle and see how far we get?

Does anyone object to this? Or does anyone know why we must do (2)?

slrz commented 2 years ago

@slrz OK, it sounds like the difference is for hosts with A/AAAA records but no CNAME, such as 'google.com'. So LookupCNAME("google.com") = "google.com", nil, at least on Unix. But probably on Windows LookupCNAME("google.com") is an error because it actually looks for CNAME records. Is that the behavior difference you were getting at?

Exactly, but more importantly also LookupCNAME("amsterdam") = "amsterdam.example.com.", nil when invoked on a system with example.com in its DNS search list.

Normally we try very hard to avoid behavior changes, but in this case it is hard to see what would break given that Windows has never behaved the way Unix does, and we'd be making the Windows behavior the standard one.

Besides the "go-fqdn" library linked above, I know of at least one program that will be broken by this change in behaviour. As it's a companion program to the server parts of a video conferencing system that only run on Linux anyway (BigBlueButton), the current Windows behaviour is not very relevant. Usage there is like go-fqdn or the LookupCNAME("amsterdam") example above: invoke LookupCNAME with the result of os.Hostname as part of an attempt to mimic hostname -f.

I'm not saying those are unfixable or that what they're doing is a good idea, just that they will break where they didn't before.

slrz commented 2 years ago
  1. Make LookupCNAME match a combination of Unix and Windows, where it looks for CNAME records but also falls back to returning the input string (like "google.com") for names that have A/AAAA records and no CNAME.

Just returning the input string won't be sufficient: the callers are generally trying to mimic hostname -f so it's the DNS search list handling and consultation of host aliases from /etc/hosts that's required.

rsc commented 2 years ago

Exactly, but more importantly also LookupCNAME("amsterdam") = "amsterdam.example.com.", nil when invoked on a system with example.com in its DNS search list.

This definitely is something I don't want to break. Thanks.

If we want to make LookupCNAME the same as much as possible on Linux and Windows without breaking existing uses, it looks like we could do a CNAME lookup (including adding DNS search suffixes), and if that works we're done, and otherwise fall back to A/AAAA lookup, and if that works, return the name that had the record. On Linux this would mean adding the explicit CNAME lookup, and on Windows it would mean adding the A/AAAA fallback.

And then we could separately add a StrictCNAME bool field to net.Resolver to make Resolver.LookupCNAME mean only do the CNAME lookup.

Does that sound reasonable?

rsc commented 2 years ago

Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group

rsc commented 2 years ago

No change in consensus, so accepted. 🎉 This issue now tracks the work of implementing the proposal. — rsc for the proposal review group

gopherbot commented 1 year ago

Change https://go.dev/cl/446179 mentions this issue: net: unify CNAME handling across ports

gopherbot commented 1 year ago

Change https://go.dev/cl/451420 mentions this issue: doc/go1.20: add release notes for net package

mateusz834 commented 1 year ago

I looked at the implementation and it seems that when the DNS server responds with multiple CNAME like:

sth.go.dev. IN CNAME 1.sth.go.dev
1.sth.go.dev. IN CNAME 2.sth.go.dev

Then the LookupCNAME returns 1.sth.go.dev, not the 2.sth.go.dev. Is that the behaviour we want? Shouldn't we return the last one in the chain??

But when there will be an A/AAAA resource, then it will return 2.sth.go.dev. Edit: probably it will return 1.sth.go.dev, so this is a backwards compatibility breaking change, previously it returned the 2.sth.go.dev.

sth.go.dev. IN CNAME 1.sth.go.dev
1.sth.go.dev. IN CNAME 2.sth.go.dev
2.sth.go.dev IN A 192.0.2.1

Edit: on windows the last one in the chain is returned https://github.com/golang/go/blob/c0497d1a81b1104a3981be33abfd66252cea90c8/src/net/lookup_windows.go#L409 so this probably only affects Linux and MacOS

mateusz834 commented 1 year ago

@rsc @ianlancetaylor This change breaks /etc/hosts aliases while using cgo mode on linux.

[mateusz@arch testaa]$ cat /etc/hosts
# Static table lookup for hostnames.
# See hosts(5) for details.
127.0.0.1 localhost.localdoman localhost
::1 localhost.localdomain localhost

::1  arch.localdomain arch
[mateusz@arch testaa]$ GODEBUG=netdns=cgo+2 ./main arch
go package net: confVal.netCgo = true  netGo = false
go package net: using cgo DNS resolver
go package net: hostLookupOrder(arch) = cgo
arch.
[mateusz@arch testaa]$ GODEBUG=netdns=go+2 ./main arch
go package net: confVal.netCgo = false  netGo = true
go package net: GODEBUG setting forcing use of Go's resolver
go package net: hostLookupOrder(arch) = files,dns
arch.localdomain.
[mateusz@arch testaa]$ cat main.go
package main

import (
        "fmt"
        "net"
        "os"
)

func main() {
        a, err := net.LookupCNAME(os.Args[1])
        if err != nil {
                fmt.Printf("ERROR: %v\n", err)
                os.Exit(1)
        }
        fmt.Println(a)
}

On unix systems we cannot only use the dns resolver.

gopherbot commented 1 year ago

Change https://go.dev/cl/454397 mentions this issue: net: use getaddrinfo while searching for cname on unix systems

gopherbot commented 1 year ago

Change https://go.dev/cl/455275 mentions this issue: net: rework the CNAME handling on unix