digipres / awesome-digital-preservation

Carefully curated list of awesome digital preservation resources.
Creative Commons Zero v1.0 Universal
67 stars 8 forks source link

Upgrade link checker to open an issue if there are broken links #15

Open anjackson opened 5 months ago

anjackson commented 5 months ago

The linting workflow includes a link checker, but this is going off all the time and it's not helpful.

 ERROR: 10 dead links found!
[✖] https://twitter.com/anjacks0n/status/809343452995522560 → Status: 400
[✖] https://twitter.com/CriticalSteph/status/809365764549595136 → Status: 400
[✖] https://twitter.com/pnwagner/status/809356219471302656 → Status: 400
[✖] https://twitter.com/d_n_t → Status: 400
[✖] https://twitter.com/C_Fryer/status/809547404366192642 → Status: 400
[✖] https://twitter.com/DavidUnderdown9/lists/digipres1 → Status: 400
[✖] https://twitter.com/An_Old_Hand/lists/digi-pres-rdm → Status: 400
[✖] https://twitter.com/digipresnews/followers → Status: 400
[✖] https://twitter.com/NKrabben/lists → Status: 400
[✖] https://twitter.com/XFR_collective → Status: 400

These links are not really dead. It's just Twitter blocking things.

anjackson commented 5 months ago

Just tweaking the config seems to be sufficient.

anjackson commented 5 months ago

Hmm, also looking at https://github.com/digipres/policies/issues/10#issuecomment-2051287962 and I think a more sophisticated solution is justified.

The main reason for this is that workflow error notifications only go to whoever touched the workflow last, which is not really appropriate in this case. What we really want is a link checking approach we can use across multiple repos and that will record the results as a GitHub issue.

After spending some time exploring the different GitHub Actions currently available, this seems to be a really good approach: https://github.com/marketplace/actions/lychee-broken-link-checker

It can to HTML as well as Markdown, and is more configurable, e.g. caching and setting retry-delays (which have caused problems here, forcing me to switch-off checks for some URLs because the current process was retrying too fast and couldn't be slowed down).

The only problem seems to be implemented as-is, it will add a new issue every time it runs, even if there's already an open issue. However, it may be possible to combine it with https://github.com/JasonEtco/create-an-issue to find an existing issue and pass the steps.[ID].outputs.number on as the issue-number to add the link report to the body of the issue.

anjackson commented 5 months ago

Another transient false-positive today log. Presumably the DOI resolver didn't like us going too fast.

anjackson commented 3 months ago

Another Lychee GitHub Action here: https://github.com/lycheeverse/lychee-action