Closed chalin closed 7 years ago
Another example:
External link http://caniuse.com/#feat=shadowdom failed: http://caniuse.com/#feat=shadowdom exists, but the hash 'feat=shadowdom' does not
The link is valid, but the checker cannot make sense of this particular use of an anchor/fragment, so it is likely a good candidate for whitelisting.
Will you be invoking linkcheck from the command line (like in a shell script)? In that case, how would you prefer to give the excluded regexps? As a separate text file?
linkcheck :4001 -x exclude.txt
Does that seem reasonable? The other option is to provide it in line, but that makes the invocation ugly and brittle.
If this configuration-by-file is okay with you, how would you prefer the exclude.txt
file to look? Regexp per line, no comments? YAML? For example, have you ever wanted to have more structure in the exclude = [ ... ]
option? Can you imagine needing something more than lines?
Also, does it need to be RegExp or should we use glob to make the writing of that file a bit easier?
Last but not least, should this feature be called whitelist or exclude or something else? Whitelist seems confusing to me, but so can exclude, I guess.
This is an example of linkcheck output that actually shows an error, despite the link being valid:
- http://localhost:4001/tools/dart2js
* External link https://developer.apple.com/library/safari/documentation/AppleApplications/Conceptual/Safari_Developer_Guide/Debugger/Debugger.html#//apple_ref/doc/uid/TP40007874-CH5-SW1 failed: response code 0 means something's wrong.
It's possible libcurl couldn't connect to the server or perhaps the request timed out.
Sometimes, making too many requests at once also breaks things.
Either way, the return message (if any) from the server is: SSL connect error
I'm confused. Is this output from linkcheck? Or is it just an example of something you'd like to exclude?
This is output from linkcheck (I updated the comment to clarify that).
You ask valid questions. Here are some initial thoughts:
#
starts a line comment and otherwise there is a pattern per line.glob
|regex
] pattern.--skip-patterns <file>
or --skip-links <pattern-file>
.I will assume you want to (A) exclude the links as they are stated in href
. The other approach (B) would be to exclude links by their final URL (after redirects). That would mean trying all links by default, just in case they end up being redirected to a non-skipped URL.
I'm implementing (A). Stop me if you'd prefer (B).
Ok done, please see this section of the readme. Let me know whether this works for you.
I should add: pub global activate linkcheck
to get the newest version.
Very nice! It seems to be working like a charm!
As an example of where this would be useful is when running the checker over https://webdev.dart-lang.org. We currently do not yet have an Angular guide for the Router, but we do have some Angular pages that already link into the (soon to be created) Router page. It would be great if we could whitelist links to the router page.
As an example the broken-link-checker has an excludeKeywords option. We use it like this under angular.io (note the value of the
exclude
array variable):cc @kwalrath @kevmoo