gjtorikian / html-proofer

Test your rendered HTML files to make sure they're accurate.
MIT License
1.57k stars 199 forks source link

Question - setting up html proofer to skip new pages added to a website which will return a 404 #808

Closed lwasser closed 9 months ago

lwasser commented 10 months ago

hey there 👋 Happy early new year!

i've been trying to understand how to implement html-proofer so it ignores new files (that have new links that are not online yet). these files of course return a 404.

i found this in the readme file but i don't understand how i'd implement that approach in a github action such as this one where i'm calling the htmlproofer action

      - name: Check HTML using htmlproofer
        uses: chabad360/htmlproofer@master
        with:
          directory: "_site"
          arguments: |
            --ignore-urls "https://fonts.googleapis.com,https://fonts.gstatic.com,_site/_posts/README/index.html"
            --ignore-files "/.+\/_posts\/README.md"
            --ignore-status-codes "0,403, 429, 503, 999"

can someone provide me with some guidance so that pr's with new pages don't result in red x's in CI? do i need to create a vanilla ruby script to run in the workflow? or can i somehow use the chabad360 workflow action but add something custom? Many thanks!!

gjtorikian commented 9 months ago

You have a couple of options here:

Would any of these work for you? The GitHub Action example just collects all the new files and passes them into --ignore-files--you can provide your own list or a directory if that's easier.

gjtorikian commented 9 months ago

(Closing this not because I won't keep helping if you have questions, but because I like to keep a clean issue list in my repos.)

lwasser commented 9 months ago

hi 👋 thank you!! i definitely understand issue lists becoming unwieldy! i think i may have poorly described the issue.

Essentially we are creating a new piece of content in the PR so the new link is a new page on the website. please see here: for an example (screenshot below as well). Essentially installable-code.html is a new page in our guidebook that we are adding. so every time we add a new page html proofer can't find it because it isn't online yet.

Screenshot 2024-01-08 at 6 43 52 PM

In the html-proofer readme, i see a section on ignoring new files.

the code is below and i think it's trying to parse through the files but skipping the newly added file (maybe)?:

directories = ['content']
merge_base = %x(git merge-base origin/production HEAD).chomp
diffable_files = %x(git diff -z --name-only --diff-filter=AC #{merge_base}).split("\0")
diffable_files = diffable_files.select do |filename|
  next true if directories.include?(File.dirname(filename))

  filename.end_with?(".md")
end.map { |f| Regexp.new(File.basename(f, File.extname(f))) }

HTMLProofer.check_directory("./output", { ignore_urls: diffable_files }).run

but

  1. i'm not sure how to implement this in my github action here.
  2. And i'm also not sure if i do implement that fix will it totally skip checking the new page for bad links and such as well?

i hope this makes more sense! Essentially each time i create a new website page, our build breaks because the new page is not yet online and as such it's a broken link according to HTML proofer (i think anyway that is what is happening). many thanks again!!