aim42 / htmlSanityCheck

Standalone (batch- and command-line) and Gradle-plugin html sanity checker - detects missing images, dead links and cross-references, duplicate link targets (anchors) and the like.
Apache License 2.0
70 stars 47 forks source link

Bug: Amazon flaky results for unknown URLs #316

Open ascheman opened 9 months ago

ascheman commented 9 months ago

Amazon seems to behave differently for unknown URLs depending on misc. request parameters. Currently I run into test errors with the test case BrokenHttpLinksCheckerSpec:bad amazon link is identified as problem. It seems to work in GitHub actions but fails on my local machine, either from single test execution from IDE (IntelliJ) as well as from a full gradlew test run.

I could track it down to the following behaviour:

Locally I could further change the behaviour of Amazon by setting the User-Agent header of the request. This could even be implemented with curl

Cf. bug-316.zip

Perhaps this is similar to the the behaviour we see in #219?

I suggest to set the User-Agent header to something HSC specific (e.g, hsc/version).

ascheman commented 7 months ago

For whatever reason the problem mostly occurs locally (but seldomly also during GitHub action build).