NLeSC / guide

Software Development Guide
https://guide.esciencecenter.nl
Creative Commons Attribution 4.0 International
47 stars 30 forks source link

Upgrade GitHub workflows #345

Closed ewan-escience closed 1 month ago

ewan-escience commented 1 month ago

The Link Checker for Pull requests workflow emits a lot of warnings (see e.g. https://github.com/NLeSC/guide/actions/runs/11070939423). We should look into upgrading and adapting the workflows.

egpbos commented 1 month ago

It seems like the jitterbit/get-changed-files action is no longer maintained. We can replace it with https://github.com/marketplace/actions/changed-files.

I also don't like that the link checker checks all links in the changed files, even those that were not changed in the PR; it's a bit wasteful, but more importantly it sometimes obfuscates actually failing links that are being introduced in that PR, so it makes the check less useful.

I've been fiddling a bit with greping git diff output to feed only the added lines to lychee. This works on my laptop; a tryout on the main branch on @ewan-escience's recent commit for the JS/TS chapter that edited a lot of links:

git diff -U0 e98e815f395d4f45514deb110f92022cc514a97f 3cc681da6cbb03fb71caf3a7bf4c08b5f70ac429 | grep -v "+++" | grep "^+" | cut -c 2- | lychee -

This doesn't display filenames anymore. This we can easily fix by using the changed-files action output and looping over filenames given from that. The action would then become something like:

    steps:
      - name: Get changed markdown files
        id: changed-files
        uses: tj-actions/changed-files@v45
        with:
          # Avoid using single or double quotes for multiline patterns
          files: |
             **.md

      - name: Check all this PR's additions for broken links
        if: steps.changed-markdown-files.outputs.any_changed == 'true'
        env:
          ALL_CHANGED_FILES: ${{ steps.changed-files.outputs.all_changed_files }}
        run: |
          for file in ${ALL_CHANGED_FILES}; do
            echo "checking ${file} for added broken links..."
            git diff -U0 ${SHA_OLD} ${SHA_NEW} -- ${file} | grep -v "+++" | grep "^+" | cut -c 2- | lychee -
          done

Another issue is that on GitHub actions it's a bit tricky to find the right SHAs to compare against each other, so SHA_OLD and SHA_NEW in the above example. I found this a useful article to figure out which one it should be: https://www.kenmuse.com/blog/the-many-shas-of-a-github-pull-request/ It explains well that especially SHA_OLD is annoying ("The Case of the missing SHA"), because

  • github.event.pull_request.base.sha The SHA that represents the HEAD for the Base branch at the time the pull request was created

There's no GitHub provided variable that gives you the HEAD of base at the time that you're running the Action. The HEAD can differ from the one at the time the pull request was created if you for instance do a rebase on main while working on the PR. Following that article, I think git rev-parse ${{ github.sha }}^ should give us the correct SHA_OLD and SHA_NEW is just github.event.pull_request.head.sha.

egpbos commented 1 month ago

Ok, I guess I'm just gonna try it out now ;)

egpbos commented 1 month ago

I've tried some things here https://github.com/egpbos/guide/pull/1

Interesting observation: if you make an error in the protocol part, say you write httpv instead of http or https, then the link checker actually ignores it with a message like:

[IGNORED] httpv://the-turing-way.netlify.app/index.html | Unsupported: Error creating request client: builder error for url (httpv://the-turing-way.netlify.app/index.html)

This is actually an error we would like to catch, but lychee isn't able. We could filter manually, but that would make this even more complex.