Michael-F-Bryan / mdbook-linkcheck

A backend for `mdbook` which will check your links for you.
https://michael-f-bryan.github.io/mdbook-linkcheck/
MIT License
140 stars 29 forks source link

Configurable HTTP Error Handling #12

Open Michael-F-Bryan opened 5 years ago

Michael-F-Bryan commented 5 years ago

We might want to add some sort of http-error-behaviour option to the config which lets you select how to handle HTTP errors. Some possible strategies are:


Original comment from https://github.com/rust-lang/rustc-guide/pull/388#issuecomment-513642595:

@Michael-F-Bryan I think it would be helpful to be able to not fail the build for certain types of errors. For example, a 404 should definitely fail the build, whereas a time out, 429 (too many requests), or 50x (internal error) should not fail the build.

pihme commented 4 years ago

We use the link checker as part of our CI builds. Some sites are down every once in a while. They are down for several hours and return a 503 code during that time. And our builds fail. Would be nice to have an option to treat those as warnings and only 4XX as errors. (Currently we are thinking about setting the link checker to warnings only and then scraping the output for the error code ourselves.)

Would appreciate if you could comment whether this is a feature you would consider and in case you do at what time frame.

Michael-F-Bryan commented 4 years ago

Some sites are down every once in a while. They are down for several hours and return a 503 code during that time.

If a website will spuriously go down, is it even worth trying to check links that go to it?

Currently we are thinking about setting the link checker to warnings only and then scraping the output for the error code ourselves

Setting it to warnings only probably won't do anything. Warnings are for edge cases where the link could be broken, but it could also be a false negative. For example, if you wrote something that looks like a link (such as [some text]) without an accompanying footer ([some text]: https://example.com/).

The actual code that translates the Outcomes from linkcheck into diagnostics that are emitted to the screen is here.

Would appreciate if you could comment whether this is a feature you would consider and in case you do at what time frame.

We might need to think about a good policy for deciding what is an error and what isn't, but I like the idea of improving the way we detect errors!

Instead of evolving organically and adding if-statements as people find errors they'd like to handle differently. What about coming up with some general strategy for interpreting HTTP errors?

For example, maybe by specifying a list of rules that match on status codes, where earlier rules take precedence?

[output.linkcheck.http-error-handling]
# specific rules
"200" = "OK"
"429" = "Warn"
"503" = "Warn"

# catch-all
"400-499" = "AlwaysFail"
"500-599" = "AlwaysFail"
pihme commented 4 years ago

If a website will spuriously go down, is it even worth trying to check links that go to it?

Short answer is yes. We link to pages from http://www.omg.org . These are down for up to three hours every couple of weeks. Other than that the links are fine (also one time Amazon US was down for a couple of hours. That was truly an outlier, but I just had to take a screenshot). In general the more links you have the likelier it is that one of them is not in stellar condition.

Setting it to warnings only probably won't do anything. Warnings are for edge cases where the link could be broken, but it could also be a false negative. For example, if you wrote something that looks like a link (such as [some text]) without an accompanying footer ([some text]: https://example.com/).

Thanks for clarifying that.

For example, maybe by specifying a list of rules that match on status codes, where earlier rules take precedence?

Sounds great. Will also give other users more flexibility.

PS: I would love to contribute PRs not just an upvote on an issue, but I haven't written a single line of Rust yet. It's on my list of languages I want to learn, but might be a while before I can do something useful with it.