Closed benmoss closed 4 years ago
The original idea was to retain fragments so we can check if the item you are linking to is still valid (e.g. imagine a title changed).
An alternative could be to identify links which are identical except for the fragment specifier and only download the page once. Then the HTML can be cached so the duplicate URLs can each check there is an item with the desired id.
See also Michael-F-Bryan/linkcheck#3.
Ah, good to know! I hadn't realized the intention was to eventually test the actual HTML as well 😸
An optimization we could implement would be to strip link anchors out of links, so we reduce the total number of unique links we collect in a document. Right now we end up caching/verifying them as independent links. The verifier could try to verify that the anchor tag actually exists on the document, but that'd be an extra step, and we could still do that without re-requesting the resource.