aim42 / htmlSanityCheck

Standalone (batch- and command-line) and Gradle-plugin html sanity checker - detects missing images, dead links and cross-references, duplicate link targets (anchors) and the like.
Apache License 2.0
70 stars 46 forks source link

avoid bad 403 and 405 results #219

Open gernotstarke opened 6 years ago

gernotstarke commented 6 years ago

e.g. Amazon always returns 405 upon HEAD requests.

We should send a GET after all suspicious error codes (esp. 403 and 405) to get better results.

gernotstarke commented 6 years ago

should be fixed now on branch 1.0.0-RC-2

rdmueller commented 6 years ago

the test currently breaks. Could it be that amazon has some kind of crawler protection? The checker currently seems to get a 503 when the tests are run through travis or locally... Maybe we need to send a fake user agent in addition to get through...

rdmueller commented 6 years ago

current workaround: ignore automated test and ignore false negatives in test result

gernotstarke commented 6 years ago

added some special-case handling (add user-agent to connection), reducing the number of wrong 403's for some servers.