Strip link anchors from URLs

Michael-F-Bryan / mdbook-linkcheck

A backend for `mdbook` which will check your links for you.

https://michael-f-bryan.github.io/mdbook-linkcheck/

MIT License

144 stars 29 forks source link

Strip link anchors from URLs #44

Closed benmoss closed 4 years ago

benmoss commented 4 years ago

An optimization we could implement would be to strip link anchors out of links, so we reduce the total number of unique links we collect in a document. Right now we end up caching/verifying them as independent links. The verifier could try to verify that the anchor tag actually exists on the document, but that'd be an extra step, and we could still do that without re-requesting the resource.

Michael-F-Bryan commented 4 years ago

The original idea was to retain fragments so we can check if the item you are linking to is still valid (e.g. imagine a title changed).

An alternative could be to identify links which are identical except for the fragment specifier and only download the page once. Then the HTML can be cached so the duplicate URLs can each check there is an item with the desired id.