aim42 / htmlSanityCheck

Standalone (batch- and command-line) and Gradle-plugin html sanity checker - detects missing images, dead links and cross-references, duplicate link targets (anchors) and the like.
Apache License 2.0
70 stars 47 forks source link

"Illegal characters" warning appears to be spurious #271

Closed mernst closed 5 years ago

mernst commented 5 years ago

When running on a document that contains

You may wish to join the
<a href="https://groups.google.com/forum/#!forum/randoop-discuss">mailing
list</a>.

I get the warning:

205 href checked, 1 missing id found. link "https://groups.google.com/forum/#!forum/randoop-discuss" contains illegal characters (Suggestions: optiongroup:Logging,-notifications,-and-troubleshooting-Randoop)

This URL does work in a browser, and quoting the characters (such as https://groups.google.com/forum/%23!forum/randoop-discuss) produces a URL that does not work. So it seems that htmlSanityCheck should not issue this warning.

Some suggestions:

Thanks!

gernotstarke commented 5 years ago

Michael, thx for the error report.

The current check is really naive, implemented in

 URLUtil: public static boolean containsInvalidChars(String aLink) {

by a (too) simple RegEx

As a simple (and known-to-be-imperfect) fix I modify this Regex to let "!" pass as legal character.

Will include a regression test in URLUtilSpec.groovy.

gernotstarke commented 5 years ago

@mernst - I commited and pushed version 1.1.1 to the plugin portal. Please upgrade your build configuration ...

mernst commented 5 years ago

That works! Thank you very much.