golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.5k stars 17.6k forks source link

all: builders and TryBots should check documentation for broken links #37047

Open bcmills opened 4 years ago

bcmills commented 4 years ago

We occasionally end up with broken or outdated links in documentation (see #46545, https://github.com/golang/go/issues/37042#issuecomment-582448476, #27860, #21951, #19244).

Testing for broken links would not be entirely trivial, but nonetheless pretty straightforward to implement: we have an HTML parser in golang.org/x/net/html, and it is easy enough to issue a HEAD request for link targets to see if they resolve. (For links with anchors, we would probably want to also check the Content-Type header and then parse the actual linked HTML to ensure that the anchor exists.)

CC @golang/osp-team

dmitshur commented 4 years ago

Thanks for opening an issue about this. I agree it's not entirely trivial, but if implemented in a way that doesn't cause too much noise or false positives, I also agree that this would be helpful.

I believe @jayconrod wanted this too. I can't find an existing issue; it may have been a comment in a CL.

Edit: Found it, it was this comment in a CL.

bcmills commented 4 years ago

Thinking about this some more: we have control over golang.org, go.dev, and github.com/golang URLs, but not over other URLs. Probably the TryBot test should only fail if the broken link is on one of those sites, but at least the longtest builders should also check external links.

alain91 commented 4 years ago

Hello, I'm newbee in golang but want to dive in the core code. I have experience in developping code but not in go. I would have a look into this issue (as a use case) but my investigations could take time. Is my contribution acceptable ?

bcmills commented 4 years ago

@alain91, the tricky part of this issue will likely be getting the rendered documentation (in order to obtain the links it contains), and indexing that documentation (in order to be able to detect broken relative links). I'm not familiar with that part of the codebase myself, so I don't know how accessible it would be to a newcomer.

alain91 commented 4 years ago

I'm only newcomer in go. I have experience with parser, html decoder...

ShawnROGrady commented 4 years ago

@bcmills I took a crack at implementing something like this. This just walks the filepath, generates documentation html, then calls HEAD on any tag values with an href key. Running against the src directory outputs:

$ godoc_link_validator -check '^golang\.org|go\.dev|github.com/golang$' ./src
2020/03/03 15:05:59 src/cmd/go/internal/lockedfile: http HEAD 'https://golang.org/issue/20803.' status code: 404

Without ignoring external links (just ignoring localhost and example URLs) I get:

$ godoc_link_validator -ignore '^localhost|example|code' ./src 
2020/03/03 15:08:21 src/cmd/go: error retrieving 'https://proxy.golang.org,direct': Head "https://proxy.golang.org,direct": dial tcp: lookup proxy.golang.org,direct: no such host
src/cmd/go/internal/lockedfile: http HEAD 'https://golang.org/issue/20803.' status code: 404
src/cmd/internal/obj/arm: http HEAD 'http://infocenter.arm.com/help/topic/com.arm.doc.ihi0040b/IHI0040B_aadwarf.pdf' status code: 404
src/cmd/internal/obj/arm64: http HEAD 'http://infocenter.arm.com/help/topic/com.arm.doc.ecm0665627/abi_sve_aadwarf_100985_0000_00_en.pdf' status code: 404

That proxy link represents the default GOPROXY value (found in docs here), and I'm not sure if its better to just ignore it or to update the doc generator to not include the ",direct" portion in the clickable link.

That issue link (found in docs here) returns a 404 because of a trailing period (created issue #37640 to track this).

Both of the ARM links are actually broken (here and here in docs).

I am not sure if/how this should be included in the trybots since I am not familiar with how exactly they are configured.

rsc commented 10 months ago

I sent go.dev/cl/548059 to check all links inside the go.dev server. (That is, not checking external links.) That covers all the issue @bcmills linked in the original report except for checking #fragment. I think external links are and should be beyond the scope of builders and TryBots.

I will leave this open to track potentially checking #fragment too.

gopherbot commented 10 months ago

Change https://go.dev/cl/548059 mentions this issue: cmd/golangorg: check for invalid or broken links in served HTML

gopherbot commented 2 months ago

Change https://go.dev/cl/601656 mentions this issue: main.star: add a linux/amd64 x/website trybot for release note CLs