Closed ewan-escience closed 1 month ago
It seems like the jitterbit/get-changed-files action is no longer maintained. We can replace it with https://github.com/marketplace/actions/changed-files.
I also don't like that the link checker checks all links in the changed files, even those that were not changed in the PR; it's a bit wasteful, but more importantly it sometimes obfuscates actually failing links that are being introduced in that PR, so it makes the check less useful.
I've been fiddling a bit with grep
ing git diff
output to feed only the added lines to lychee
. This works on my laptop; a tryout on the main branch on @ewan-escience's recent commit for the JS/TS chapter that edited a lot of links:
git diff -U0 e98e815f395d4f45514deb110f92022cc514a97f 3cc681da6cbb03fb71caf3a7bf4c08b5f70ac429 | grep -v "+++" | grep "^+" | cut -c 2- | lychee -
This doesn't display filenames anymore. This we can easily fix by using the changed-files
action output and looping over filenames given from that. The action would then become something like:
steps:
- name: Get changed markdown files
id: changed-files
uses: tj-actions/changed-files@v45
with:
# Avoid using single or double quotes for multiline patterns
files: |
**.md
- name: Check all this PR's additions for broken links
if: steps.changed-markdown-files.outputs.any_changed == 'true'
env:
ALL_CHANGED_FILES: ${{ steps.changed-files.outputs.all_changed_files }}
run: |
for file in ${ALL_CHANGED_FILES}; do
echo "checking ${file} for added broken links..."
git diff -U0 ${SHA_OLD} ${SHA_NEW} -- ${file} | grep -v "+++" | grep "^+" | cut -c 2- | lychee -
done
Another issue is that on GitHub actions it's a bit tricky to find the right SHAs to compare against each other, so SHA_OLD
and SHA_NEW
in the above example. I found this a useful article to figure out which one it should be: https://www.kenmuse.com/blog/the-many-shas-of-a-github-pull-request/ It explains well that especially SHA_OLD
is annoying ("The Case of the missing SHA"), because
github.event.pull_request.base.sha
The SHA that represents the HEAD for the Base branch at the time the pull request was created
There's no GitHub provided variable that gives you the HEAD of base at the time that you're running the Action. The HEAD can differ from the one at the time the pull request was created if you for instance do a rebase on main while working on the PR. Following that article, I think git rev-parse ${{ github.sha }}^
should give us the correct SHA_OLD
and SHA_NEW
is just github.event.pull_request.head.sha
.
Ok, I guess I'm just gonna try it out now ;)
I've tried some things here https://github.com/egpbos/guide/pull/1
Interesting observation: if you make an error in the protocol part, say you write httpv instead of http or https, then the link checker actually ignores it with a message like:
[IGNORED] httpv://the-turing-way.netlify.app/index.html | Unsupported: Error creating request client: builder error for url (httpv://the-turing-way.netlify.app/index.html)
This is actually an error we would like to catch, but lychee isn't able. We could filter manually, but that would make this even more complex.
The
Link Checker for Pull requests
workflow emits a lot of warnings (see e.g. https://github.com/NLeSC/guide/actions/runs/11070939423). We should look into upgrading and adapting the workflows.