aim42 / htmlSanityCheck

Standalone (batch- and command-line) and Gradle-plugin html sanity checker - detects missing images, dead links and cross-references, duplicate link targets (anchors) and the like.
Apache License 2.0
68 stars 42 forks source link

localResource: false positives! #252

Open gernotstarke opened 5 years ago

gernotstarke commented 5 years ago

A false positive error, or in short a false positive, commonly called a "false alarm", is a result that indicates a given condition exists, when it does not.

MissingLocalResourceChecker delivers too many false positives, e.g. the URL

/examples

is marked nonexisting, even if a file

/examples.html 

exists

gernotstarke commented 5 years ago

<a href="/foo">

can mean one of the following files/filenames:

where is e.g. {htm, html, shtml, phtml, php, pl}

matthiaskraaz commented 5 years ago

First: thanks, this looks really promising! Now, also in reference to the "start local webserver to improve checks": No! There lies madness this way. A webserver can be configured to apply the craziest transformations on the static pages. Finding the index file is the simplest one. There is the Apache spelling correction plugin for example: Apache finds a file, if the request is for a similiarly named one. Is it therefore a false positive, if there are only small deviations between href and file name?!? I suggest to not make htmlSanityCheck too smart/too forgiving. Now about these indexes: by default, I would not apply some Apache index file search (default) logic. I would make it a configurable how the index file is found for a directory. You could even offer the option that finding a directory is sufficient: after all, there is also the Apache index generation plugin that generates an index file for a directory on the fly (disabled most of the time for security reasons).