alexwlchan / alexwlchan.net

Source code and plugins for my website, a static site built with Jekyll
https://alexwlchan.net/
MIT License
42 stars 12 forks source link

Investigate HTML-Proofer's external link checking #763

Open alexwlchan opened 4 months ago

alexwlchan commented 3 months ago

Here's an initial draft at a script for it:

# frozen_string_literal: true

# This script checks whether external links are working.
#
# I don't run this in CI because it's slow

require 'html-proofer'

HTMLProofer.check_directory(
  '_site', {
    checks: ['Links'],
    ignore_files: [
      '_site/400/index.html',
      %r{_site/files/.*},
      # This is because of an overly slow regex in HTML-Proofer.
      # See https://github.com/gjtorikian/html-proofer/issues/816
      '_site/2013/google-maps/index.html'
    ],
    #
    # This User-Agent header allows the URl checker to fetch Twitter
    # pages; the default agent gets a 400 error.
    typhoeus: { headers: { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:105.0) Gecko/20100101 Firefox/105.0' } },
    #
    # See https://github.com/gjtorikian/html-proofer?#configuring-caching
    cache: { timeframe: { external: '30d' }, storage_dir: '.htmlproofer' },
    #
    # As of April 2024, I have 334 links which don't use HTTPS.
    # It might be nice to fix them all and/or whitelist them, but
    # they're all external links -- I don't care that much.
    #
    # For now, skip HTTPS checking.
    enforce_https: false
  }
).run