Closed JoshOrndorff closed 4 years ago
I don't think this is specific to the linkchecker. Running curl
against the num-traits
crate returns a 404 for the same URL.
$ curl -I https://crates.io/crates/num-traits
HTTP/2 404
content-type: application/json; charset=utf-8
content-length: 35
server: nginx
date: Tue, 24 Mar 2020 02:45:06 GMT
set-cookie: cargo_session=sJIiNcfM9yvCHoGNENQaO8JrPoTF1c7xuZ6xe/LTieY=; HttpOnly; Secure; Path=/
strict-transport-security: max-age=31536000
via: 1.1 vegur, 1.1 6e19875b14d906dfd0ef8e65e8726f1d.cloudfront.net (CloudFront)
x-cache: Error from cloudfront
x-amz-cf-pop: PER50-C1
x-amz-cf-id: yBCN032584y1tHHrOzh9Er41QMS01bZ4OZ1IeCBJHpjwwlyH7Y2n9A==
age: 63
I have a feeling this is because crates.io is built using a JavaScript framework like ember or react. When you open it in your browser it'll fall back to /
and then the JS router will change the URL to /crates/num-traits
. The linkchecker essentially calls reqwest::get()
, so we don't run any JS.
This is probably related rust-lang/crates.io#788 (see https://github.com/rust-lang/rustc-dev-guide/pull/184#issuecomment-421537610).
Yes, this is true for any crates.io
URL. We have explicitly blacklisted URLs to crates.io
in the rustc-dev-guide
.
Okay, guess not much to do here then. Thanks for the explanation.
I found a workaround for link-checking to crates.io
. Check docs.rs
instead:
Instead of
curl --head https://crates.io/crates/num-complex
Do:
curl --head https://docs.rs/num-complex/latest/num_complex/
I had the same problem with a couple of domains / websites and found a different GitHub Action that works for me for link checking: linkspector
It seems to do the checks with mocking up some kind of credible browser session, and then all the websites I currently have in there, give a proper response. Also, it checks internal MarkDown links correctly, and also offers to check links in other formats (like RestructuredText).
For the maintainer, maybe there are good ideas in there? Or this also solves your needs in a more general way? In any case, many thanks for your efforts on this linkchecker, it was very useful!
I use linkcheck for the Substrate Recipes. Thank you for the excellent backend.
So far I've encountered two links that regularly cause the link checker to fail, despite loading fine in a normal web browser. You can see the more recent occurence in this PR https://github.com/substrate-developer-hub/recipes/pull/180 And you can see that I've worked around the issue by adding the url to my exclude list.
Ultimately I'd prefer to properly diagnose the failure rather than excluding them.