golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.71k stars 17.5k forks source link

cmd/go: cryptic error message when git repository is reachable locally but not reachable from sum.golang.org google SaaS server #51174

Open ForestJohnson opened 2 years ago

ForestJohnson commented 2 years ago

What version of Go are you using (go version)?

$ go version
go version go1.17.7 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/forest/.cache/go-build"
GOENV="/home/forest/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/forest/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/forest/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.17.7"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/forest/Desktop/git/gandi-dns-updater/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build911627335=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I am trying to go get a module while I'm developing an app to fix my DNS issues, however go get is ignoring the hosts file entry I created to temporarily fix my git server's incorrect domain name. does not work because my git server git.sequentialread.com is not reachable from the sum.golang.org google SaaS server which is tightly integrated into the way the go get command works.

forest@thingpad:~$ nslookup git.sequentialread.com 8.8.8.8
Server:     8.8.8.8
Address:    8.8.8.8#53

Non-authoritative answer:
Name:   git.sequentialread.com
Address: 71.34.16.185

forest@thingpad:~/Desktop/git/gandi-dns-updater$ cat /etc/hosts
127.0.0.1       localhost
127.0.1.1       thingpad

71.34.5.123 pwm.sequentialread.com
71.34.5.123 git.sequentialread.com
71.34.5.123 sequentialread.com

...

forest@thingpad:~/Desktop/git/gandi-dns-updater$ go get git.sequentialread.com/forest/config-lite

What did you expect to see?

I expected the go get software do what I asked, go to my git server and download the module.

However, in the event that go get cannot do that for me, I expected the go get software to tell me in clear, human-readable terms why it cannot do that.

What did you see instead?

forest@thingpad:~/Desktop/git/gandi-dns-updater$ go get git.sequentialread.com/forest/config-lite
go: downloading git.sequentialread.com/forest/config-lite v0.0.0-20220122212408-4941a7aa3dc0
go get: git.sequentialread.com/forest/config-lite@v0.0.0-20220122212408-4941a7aa3dc0: verifying module: git.sequentialread.com/forest/config-lite@v0.0.0-20220122212408-4941a7aa3dc0: reading https://sum.golang.org/lookup/git.sequentialread.com/forest/config-lite@v0.0.0-20220122212408-4941a7aa3dc0: 410 Gone
        server response: not found: git.sequentialread.com/forest/config-lite@v0.0.0-20220122212408-4941a7aa3dc0: unrecognized import path "git.sequentialread.com/forest/config-lite": https fetch: Get "https://git.sequentialread.com/forest/config-lite?go-get=1": dial tcp 71.34.16.185:443: connect: connection refused

When I 1st saw this error, I was 100% convinced that it was an issue related to golang not parsing /etc/resolv.conf, /etc/nsswitch.conf or /etc/hosts correctly. I quickly realized that it only happened with go get, not with a normal http call to git.sequentialread.com, so I decided to investigate further. However, after following that red herring all day long, I finally got to the point where I had run out of things to check and I could not make sense of what I was seeing, according to delve go get WAS resolving the right IP address 71.34.5.123 not the wrong one 71.34.16.185. It was connecting to 71.34.5.123, not 71.34.16.185. But I was still seeing an error that says dial tcp 71.34.16.185:443: connect: connection refused

Eventually with dumb luck / trying anything after running out of options, I found the answer:

forest@thingpad:~$ curl https://sum.golang.org/lookup/git.sequentialread.com/forest/config-lite@v0.0.0-20220122212408-4941a7aa3dc0

not found: git.sequentialread.com/forest/config-lite@v0.0.0-20220122212408-4941a7aa3dc0: unrecognized import path "git.sequentialread.com/forest/config-lite": https fetch: Get "https://git.sequentialread.com/forest/config-lite?go-get=1": dial tcp 71.34.16.185:443: connect: connection refused

The error message I was seeing was actually being thrown on google's server, not on my computer.

ForestJohnson commented 2 years ago
name resolution wild goose chase / red herring
Now I'm going to attempt to figure out WHY this happens. First of all it looks like it might start here: https://github.com/golang/go/blob/faa04090f8d3e1b699328c5a4aaadb5bb78d7dfe/src/cmd/go/internal/vcs/vcs.go#L943-L955 which goes https://github.com/golang/go/blob/faa04090f8d3e1b699328c5a4aaadb5bb78d7dfe/src/cmd/go/internal/web/api.go#L166-L183 then https://github.com/golang/go/blob/faa04090f8d3e1b699328c5a4aaadb5bb78d7dfe/src/cmd/go/internal/web/http.go#L63 then https://github.com/golang/go/blob/faa04090f8d3e1b699328c5a4aaadb5bb78d7dfe/src/cmd/go/internal/web/http.go#L122-L127 then https://github.com/golang/go/blob/faa04090f8d3e1b699328c5a4aaadb5bb78d7dfe/src/cmd/go/internal/web/http.go#L91-L115 then it calls `.Do(request)` on the `securityPreservingHTTPClient`, i think ... ? https://github.com/golang/go/blob/faa04090f8d3e1b699328c5a4aaadb5bb78d7dfe/src/cmd/go/internal/web/http.go#L44-L61 I'm assuming since the transport isn't specified on the `securityPreservingHTTPClient`, then it must be using the `DefaultTransport`...? https://github.com/golang/go/blob/faa04090f8d3e1b699328c5a4aaadb5bb78d7dfe/src/net/http/transport.go#L38-L54 At this point I have no idea why the bug is happening and I feel like I would have to learn how to run an interactive debugger against `go get` and run it against my own `http.Get()` code and find the difference that way.
jfesler commented 2 years ago

https://pkg.go.dev/net#hdr-Name_Resolution

ForestJohnson commented 2 years ago
name resolution wild goose chase / red herring
https://paste.cyberia.club/~forest/409095b1a74946eb6ec0a9dbbe421f517411a1c5 > delve attempt #1, discovering "func (cm *connectMethod) addr() https://paste.cyberia.club/~forest/671d55f02c152fda743368052958023f20acbdcf > delve attempt #2, discovering "net.(*Resolver).lookupIPAddr()" https://paste.cyberia.club/~forest/52340c2560865d0015cfcbe9e62f84647f76ea74 > delve attempt #3, discovering "net.(*Resolver).lookupIP()" and nss struct https://paste.cyberia.club/~forest/b49fa1f36214ad9f11ac3a92ccc0769079710dd1 > delve attempt #4, discovering "cgoIPLookup" https://paste.cyberia.club/~forest/ee822db61399c9505d323ba21d3e24cb787f7916 > delve attempt #5, poking at "C.getaddrinfo" Conclusions I have drawn from my `delve` adventure: 1. the nsswitch.conf file is being parsed: ``` > net.systemConf() /usr/local/go/src/net/conf.go:44 (PC: 0x74f7f0) Values returned: 39: ) 40: 41: // systemConf returns the machine's network configuration. 42: func systemConf() *conf { 43: confOnce.Do(initConfVal) => 44: return confVal 45: } 46: 47: func initConfVal() { 48: dnsMode, debugLevel := goDebugNetDNS() 49: confVal.dnsDebugLevel = debugLevel ... (dlv) p confVal.nss.sources["hosts"] []net.nssSource len: 4, cap: 4, [ { source: "files", criteria: []net.nssCriterion len: 0, cap: 0, nil,}, { source: "mdns4_minimal", criteria: []net.nssCriterion len: 1, cap: 1, [ (*net.nssCriterion)(0xc0002a6840), ],}, { source: "dns", criteria: []net.nssCriterion len: 0, cap: 0, nil,}, { source: "mymachines", criteria: []net.nssCriterion len: 0, cap: 0, nil,}, ] ``` For some reason, the `unknownOpt` flag was set on the dnsConfig struct: ``` (dlv) so > net.(*Resolver).lookupIP() /usr/local/go/src/net/lookup_unix.go:95 (PC: 0x779e2e) Values returned: ~r0: *net.conf { forceCgoLookupHost: false, netGo: false, netCgo: false, hasMDNSAllow: false, goos: "linux", dnsDebugLevel: 0, nss: *net.nssConf { err: error nil, sources: map[string][]net.nssSource [...],}, resolv: *net.dnsConfig { ... unknownOpt: true, ``` This caused the following condition to fall through: ``` > net.(*conf).hostLookupOrder() /usr/local/go/src/net/conf.go:134 (PC: 0x74fecc) 129: } 130: fallbackOrder := hostLookupCgo 131: if c.netGo || r.preferGo() { 132: fallbackOrder = hostLookupFilesDNS 133: } => 134: if c.forceCgoLookupHost || c.resolv.unknownOpt || c.goos == "android" { 135: return fallbackOrder 136: } 137: if bytealg.IndexByteString(hostname, '\\') != -1 || bytealg.IndexByteString(hostname, '%') != -1 { 138: // Don't deal with special form hostnames with backslashes 139: // or '%'. (dlv) n > net.(*conf).hostLookupOrder() /usr/local/go/src/net/conf.go:135 (PC: 0x750065) 130: fallbackOrder := hostLookupCgo 131: if c.netGo || r.preferGo() { 132: fallbackOrder = hostLookupFilesDNS 133: } 134: if c.forceCgoLookupHost || c.resolv.unknownOpt || c.goos == "android" { => 135: return fallbackOrder 136: } 137: if bytealg.IndexByteString(hostname, '\\') != -1 || bytealg.IndexByteString(hostname, '%') != -1 { 138: // Don't deal with special form hostnames with backslashes 139: // or '%'. 140: return fallbackOrder ``` `unknownOpt` has a comment that says `// anything unknown was encountered` ![image](https://user-images.githubusercontent.com/7119703/153775172-6156f576-5e2c-467d-806d-232685021e09.png) so I decided to figure out why `unknownOpt` was being set to true by putting a breakpoint on those `conf.unknownOpt = true` references I found. But first, I decided to debug my sample ``` func main() { response, err := http.Get("https://git.sequentialread.com") log.Printf("%+v, %+v", response, err) } ``` program and figure out what it was doing differently. More on that later. Finally, I got to the part where the go code stops and it calls libc or makes a syscall or whatever: ``` forest@thingpad:~/Desktop/git/go/src/cmd/go$ cat delve-init break /usr/local/go/src/net/cgo_unix.go:164 condition 1 name == "git.sequentialread.com" continue forest@thingpad:~/Desktop/git/go/src/cmd/go$ /home/forest/go/bin/dlv debug --init delve-init -- get git.sequentialread.com/forest/config-lite 2022-02-13T15:03:23-06:00 error layer=debugger can't find build-id note on binary Type 'help' for list of commands. Breakpoint 1 set at 0x793801 for net.cgoLookupIPCNAME() /usr/local/go/src/net/cgo_unix.go:164 > net.cgoLookupIPCNAME() /usr/local/go/src/net/cgo_unix.go:164 (hits goroutine(60):1 total:1) (PC: 0x793801) 159: 160: h := make([]byte, len(name)+1) 161: copy(h, name) 162: var res *C.struct_addrinfo 163: gerrno, err := C.getaddrinfo((*C.char)(unsafe.Pointer(&h[0])), nil, &hints, &res) => 164: if gerrno != 0 { 165: isErrorNoSuchHost := false 166: isTemporary := false 167: switch gerrno { 168: case C.EAI_SYSTEM: 169: if err == nil { (dlv) p gerrno 0 (dlv) p r Command failed: could not find symbol value for r (dlv) p res *net._Ctype_struct_addrinfo { ai_flags: 26, ai_family: 2, ai_socktype: 1, ai_protocol: 6, ai_addrlen: 16, ai_addr: *net._Ctype_struct_sockaddr { sa_family: 2, sa_data: [14]net._Ctype_char [0,0,71,34,5,123,0,0,0,0,0,0,0,0],}, ai_canonname: *103, ai_next: *net._Ctype_struct_addrinfo nil,} (dlv) ``` Where I can see the wrong (not from the hosts file) address `71,34,5,123` being returned from the C code.
ForestJohnson commented 2 years ago
name resolution wild goose chase / red herring
unknownOpt is being caused by the line `options edns0 trust-ad` in my `/etc/resolv.conf` ``` forest@thingpad:~/Desktop/git/go/src/cmd/go$ cat delve-init break /usr/local/go/src/net/dnsconfig_unix.go:137 break /usr/local/go/src/net/dnsconfig_unix.go:148 continue forest@thingpad:~/Desktop/git/go/src/cmd/go$ /home/forest/go/bin/dlv debug --init delve-init -- get git.sequentialread.com/forest/config-lite 2022-02-13T15:24:41-06:00 error layer=debugger can't find build-id note on binary Type 'help' for list of commands. Breakpoint 1 set at 0x762a33 for net.dnsReadConfig() /usr/local/go/src/net/dnsconfig_unix.go:137 Breakpoint 2 set at 0x761f2d for net.dnsReadConfig() /usr/local/go/src/net/dnsconfig_unix.go:148 > net.dnsReadConfig() /usr/local/go/src/net/dnsconfig_unix.go:137 (hits goroutine(51):1 total:1) (PC: 0x762a33) 132: // This option forces the use of TCP for DNS resolutions." 133: // https://www.freebsd.org/cgi/man.cgi?query=resolv.conf&sektion=5&manpath=freebsd-release-ports 134: // https://man.openbsd.org/resolv.conf.5 135: conf.useTCP = true 136: default: => 137: conf.unknownOpt = true 138: } 139: } 140: 141: case "lookup": 142: // OpenBSD option: (dlv) p s "edns0" (dlv) p f []string len: 3, cap: 3, [ "options", "edns0", "trust-ad", ] (dlv) p filename "/etc/resolv.conf" ... forest@thingpad:~/Desktop/git/go/src/cmd/go$ cat /etc/resolv.conf # This file is managed by man:systemd-resolved(8). Do not edit. # # This is a dynamic resolv.conf file for connecting local clients to the # internal DNS stub resolver of systemd-resolved. This file lists all # configured search domains. # # Run "resolvectl status" to see details about the uplink DNS servers # currently in use. # # Third party programs must not access this file directly, but only through the # symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a different way, # replace this symlink by a static file or a different symlink. # # See man:systemd-resolved.service(8) for details about the supported modes of # operation for /etc/resolv.conf. nameserver 127.0.0.53 options edns0 trust-ad ```
ForestJohnson commented 2 years ago
name resolution wild goose chase / red herring
https://www.man7.org/linux/man-pages/man5/resolv.conf.5.html ``` edns0 (since glibc 2.6) Sets RES_USE_EDNS0 in _res.options. This enables support for the DNS extensions described in RFC 2671. ``` ``` trust-ad (since glibc 2.31) Sets RES_TRUSTAD in _res.options. This option controls the AD bit behavior of the stub resolver. If a validating resolver sets the AD bit in a response, it indicates that the data in the response was verified according to the DNSSEC protocol. In order to rely on the AD bit, the local system has to trust both the DNSSEC- validating resolver and the network path to it, which is why an explicit opt-in is required. If the trust-ad option is active, the stub resolver sets the AD bit in outgoing DNS queries (to enable AD bit support), and preserves the AD bit in responses. Without this option, the AD bit is not set in queries, and it is always removed from responses before they are returned to the application. This means that applications can trust the AD bit in responses if the trust-ad option has been set correctly. In glibc version 2.30 and earlier, the AD is not set automatically in queries, and is passed through unchanged to applications in responses. ```
ForestJohnson commented 2 years ago
name resolution wild goose chase / red herring
Ok. So it sounds like this is happening because go does not support these `resolv.conf` options, so it's throwing up its hands and falling back to something which doesn't work... ? is that right ? But that doesn't explain why example app with `http.Get` works fine. So now I will work on figuring out the difference
ForestJohnson commented 2 years ago

Wait hang on i got confused, hold the phone,

71,34,5,123 IS the right address from /etc/hosts, and thats what the name lookup appears to be returning,

but i'm still seeing https fetch: Get "https://git.sequentialread.com/forest/config-lite?go-get=1": dial tcp 71.34.16.185:443: connect: connection refused error log at the end of the very same debug session

ForestJohnson commented 2 years ago

OK, so whatever I was looking at before appears to not be related to the actual problem.

forest@thingpad:~/Desktop/git/go/src/cmd/go$ cat delve-init 
break /usr/local/go/src/net/http/transport.go:1605
condition 1 pconn.cacheKey.addr == "git.sequentialread.com:443" && conn.conn.fd.raddr.IP[3] == 185
continue

forest@thingpad:~/Desktop/git/go/src/cmd/go$ /home/forest/go/bin/dlv debug --init delve-init -- get git.sequentialread.com/forest/config-lite
2022-02-13T16:02:25-06:00 error layer=debugger can't find build-id note on binary
Type 'help' for list of commands.
Breakpoint 1 set at 0x89f345 for net/http.(*Transport).dialConn() /usr/local/go/src/net/http/transport.go:1605
go: downloading git.sequentialread.com/forest/config-lite v0.0.0-20220122212408-4941a7aa3dc0
go get: git.sequentialread.com/forest/config-lite@v0.0.0-20220122212408-4941a7aa3dc0: verifying module: git.sequentialread.com/forest/config-lite@v0.0.0-20220122212408-4941a7aa3dc0: reading https://sum.golang.org/lookup/git.sequentialread.com/forest/config-lite@v0.0.0-20220122212408-4941a7aa3dc0: 410 Gone
        server response: not found: git.sequentialread.com/forest/config-lite@v0.0.0-20220122212408-4941a7aa3dc0: unrecognized import path "git.sequentialread.com/forest/config-lite": https fetch: Get "https://git.sequentialread.com/forest/config-lite?go-get=1": dial tcp 71.34.16.185:443: connect: connection refused
delve-init:3: Process 101743 has exited with status 1
(dlv) 

however if i change my breakpoint condition to conn.conn.fd.raddr.IP[3] == 123 then the breakpoint does hit before it crashes.

So where the hell is it getting 71.34.16.185 from ???

ForestJohnson commented 2 years ago

OMFG its happening because of a google service that is trying to connect to my git server . The error I am seeing is being thrown on the google telemetry server.

forest@thingpad:~$ curl https://sum.golang.org/lookup/git.sequentialread.com/forest/config-lite@v0.0.0-20220122212408-4941a7aa3dc0

not found: git.sequentialread.com/forest/config-lite@v0.0.0-20220122212408-4941a7aa3dc0: unrecognized import path "git.sequentialread.com/forest/config-lite": https fetch: Get "https://git.sequentialread.com/forest/config-lite?go-get=1": dial tcp 71.34.16.185:443: connect: connection refused

See the explanation at https://sum.golang.org

The google telemetry built into the go get command is whats breaking it, its preventing it from functioning.

ForestJohnson commented 2 years ago

OK, to summarize:

  1. The design of the go get command is built around google SaaS
  2. The go get command / SaaS is designed to force go users to have their packages reachable on the public internet 100% of the time
    • It implements this by simply crashing when the package is not globally reachable.
    • It also does this when we are trying to pull a package in order to make a build in order to fix something so that our packages can be globally reachable on the internet again :face_exhaling: :facepalm:
  3. The error message that users see when the package is reachable locally, but not reachable from google's SaaS is exceedingly cryptic
    • The text which describes whats happening, server response, is not nearly specific or notice-able enough to be understood by a human user encountering this error for the first time.
    • There is no instructions on how to fix it or work around it.
ForestJohnson commented 2 years ago

I'm getting big

https://www.theverge.com/2020/11/12/21563092/apple-mac-apps-load-slow-big-sur-downloads-outage-down-issues

"Mac users couldn’t launch apps this afternoon after Apple verification server issue"

vibes right now, not gonna lie.

ForestJohnson commented 2 years ago

The work around is written here https://medium.com/mabar/today-i-learned-fix-go-get-private-repository-return-error-reading-sum-golang-org-lookup-93058a058dd8

export GONOSUMDB='*'
export GONOPROXY='*'

The error message should explain this workaround.

Also yall should probably not make tools that run on our work stations (like go get) depend 100% on google servers by default. Its very disrespectful IMO.

seankhliao commented 2 years ago

See https://go.dev/ref/mod#private-module-proxy-direct The defaults are there so we can ensure software supply chain security across the ecosystem for most dependencies (which are public).

ForestJohnson commented 8 months ago

This is why I use the word "misleading", because pretty much every golang developer I run into (including myself before this issue thread) has no idea that this feature exists, and thier mental model of what go get is doing is wrong.

image

So, in terms of my lived experience working with go for several years and seeing other people doing the same, I know that it is in fact misleading.

Whether it talks about this in some manual page somewhere or not doesn't really matter; no one is going to read that until after they discover that go get isn't doing what they expected. The affordances and apparent behavior of the tool are the only way to "explain" this to users, I would argue that in that sense, the tool's behavior IS misleading.

ianlancetaylor commented 2 weeks ago

I think the fix here is to modify checkSumDB in cmd/go/internal/modfetch/fetch.go. If the call to lookupSumDB fails, we should clarify that the error is being reported by the checksum database and point people at https://golang.org/ref/mod#authenticating and/or https://go.dev/ref/mod#checksum-database.