elastisys / compliantkubernetes

Documentation for the Compliant Kubernetes project
https://elastisys.io/compliantkubernetes
Apache License 2.0
120 stars 32 forks source link

Unify linkchecker config files #923

Closed cristiklein closed 3 months ago

cristiklein commented 3 months ago

We totally want to check not only URLs, but also anchors, for better visitor experience. Among others, we use this to avoid breaking anchors in the ToS. Unfortunately, Some pages contain dynamically generated anchors, which fail AnchorCheck. We previously ran linkchecker twice, with two ignore lists.

This approach led to a few issues:

  1. We forgot to update both config files (without AnchorCheck and with AnchorCheck) at the same time, which led to many false positive.
  2. Linkchecking takes twice the amount of time.

To avoid these issues, we decided to only run linkchecker once, with AnchorCheck. This means that URLs to some pages won't be checked at all. We accept this risk: We prefer to slightly underalert, as opposed to overalert.

To test this PR, I ran the following command on my laptop:

python3 -m venv .venv-linkchecker
. .venv-linkchecker/bin/activate
pip3 install -r requirements-linkchecker.txt
linkchecker --config linkchecker.conf https://elastisys.io

And got the following output:

Statistics:
Downloaded: 505.4MB.
Content types: 131 image, 5810 text, 0 video, 0 audio, 26 application, 1 mail and 304 other.
URL lengths: min=14, max=899, avg=127.

That's it. 6272 links in 6320 URLs checked. 0 warnings found. 0 errors found.
Stopped checking at 2024-06-24 10:58:57+002 (35 minutes, 22 seconds)

⚠️ IMPORTANT ⚠️: This is a public repository. Make sure to not disclose:

Quality gates: