Open kortschak opened 4 years ago
Relevant HTML5 specification: https://dev.w3.org/html5/html-author/Overview.src.html#unquoted-attr
Relevant code is in discovery.go.
The problem is that we use an XML parser instead of an HTML parser. XML requires quotes. We could potentially use golang.org/x/net/html instead.
@jayconrod golang.org/x/net is already vendored into the tree, but does not include the html package. What is the process for including vendored code into the go tree?
@kortschak Mostly the same as other modules; import it and run go mod vendor
.
html
is a very large package though, and we don't need its full functionality. Perhaps instead, we should use a more limited ad hoc parser that just extracts this tag. Remote import paths already warns people not to expect a full parser to be used: in particular, avoid embedded JS or CSS before the meta tag.
If it helps, here is an implementation of a low level HTML parser: https://github.com/tdewolff/parse/tree/master/html
But perhaps a simple regular expression could be the easiest ad-hoc solution, although it could give false-positives.
A regular expression might be good enough, since we just need to match one tag, not matching pairs of open / closing tags. We'd have to be smart about <script>
and <style>
.
Another thing to consider: any change here needs to test against a large set of Go projects to ensure we don't break anything that was working before. I think that's the hardest part of this.
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
This issue affecting us has been fixed via a workaround, so I can't post a working reproducer, but the previously it could have been replicated by doing the following.
GOPROXY=direct go get gonum.org/v1/gonum
What did you expect to see?
Downloading the module.
What did you see instead?
Additional information
This was brought to golang-nuts and Gophers' slack where the issue was suggested to be that the the name attribute of the meta tag was unquoted. Ensuring that the attribute remains quoted did resolve the superficial issue, however on raising this with the author of the minifier that removes the quotes (a dependency of hugo), the quotes are apparently not required. This is confirmed by checking the following html here (note that go-import is not quoted).
Also ref gohugoio/hugo#7415