golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.29k stars 17.58k forks source link

cmd/go: get incorrectly rejects go-import/go-source meta tags with unquoted name attr #39748

Open kortschak opened 4 years ago

kortschak commented 4 years ago

What version of Go are you using (go version)?

$ go version
go version go1.14.4 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN="/home/user/bin"
GOCACHE="/home/user/.cache/go-build"
GOENV="/home/user/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/user"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/user/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/user/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build389226641=/tmp/go-build -gno-record-gcc-switches"

What did you do?

This issue affecting us has been fixed via a workaround, so I can't post a working reproducer, but the previously it could have been replicated by doing the following.

GOPROXY=direct go get gonum.org/v1/gonum

What did you expect to see?

Downloading the module.

What did you see instead?

$ GOPROXY=direct go get gonum.org/v1/gonum
go get gonum.org/v1/gonum: unrecognized import path "gonum.org/v1/gonum": reading https://gonum.org/v1/gonum?go-get=1: 404 Not Found

Additional information

This was brought to golang-nuts and Gophers' slack where the issue was suggested to be that the the name attribute of the meta tag was unquoted. Ensuring that the attribute remains quoted did resolve the superficial issue, however on raising this with the author of the minifier that removes the quotes (a dependency of hugo), the quotes are apparently not required. This is confirmed by checking the following html here (note that go-import is not quoted).

<!doctype html><html lang=en-us>
<head>
<meta name=go-import content="domain.tld git https://github.com/domain/pkg">
<title>Title</title>
</head>

Also ref gohugoio/hugo#7415

tdewolff commented 4 years ago

Relevant HTML5 specification: https://dev.w3.org/html5/html-author/Overview.src.html#unquoted-attr

jayconrod commented 4 years ago

Relevant code is in discovery.go.

The problem is that we use an XML parser instead of an HTML parser. XML requires quotes. We could potentially use golang.org/x/net/html instead.

kortschak commented 4 years ago

@jayconrod golang.org/x/net is already vendored into the tree, but does not include the html package. What is the process for including vendored code into the go tree?

jayconrod commented 4 years ago

@kortschak Mostly the same as other modules; import it and run go mod vendor.

html is a very large package though, and we don't need its full functionality. Perhaps instead, we should use a more limited ad hoc parser that just extracts this tag. Remote import paths already warns people not to expect a full parser to be used: in particular, avoid embedded JS or CSS before the meta tag.

tdewolff commented 4 years ago

If it helps, here is an implementation of a low level HTML parser: https://github.com/tdewolff/parse/tree/master/html

But perhaps a simple regular expression could be the easiest ad-hoc solution, although it could give false-positives.

jayconrod commented 4 years ago

A regular expression might be good enough, since we just need to match one tag, not matching pairs of open / closing tags. We'd have to be smart about <script> and <style>.

Another thing to consider: any change here needs to test against a large set of Go projects to ensure we don't break anything that was working before. I think that's the hardest part of this.