Open btylerburton opened 1 year ago
@btylerburton so might this one also be addressed by #4476 ?
Yes ideally @hkdctol
clicking a point in a location graph navigates to the list of links tested. there's a difference of tested links between htmlproofer and new relic. htmlproofer may be traversing more than we need?
notes on new relic link checker:
htmlproofer currently checks:
update: looks like the new relic link checker can identify a variety of types
Can the link checker alert us to 404's? Can it post to Slack?
I just checked resources.data.gov and it shows no 404's but I know that's not the case as there's a few I confirmed from this run...
https://github.com/GSA/resources.data.gov/actions/runs/7006861012/job/19059663585
ex.
after upgrading htmlproofer
from 3.x to 5.x to potentially address some issues the resources site produces 284 failures. this includes checks on links, images, scripts, and html validation. this is a considerable amount of failures and switching to another utility ( see examples in the link of alternatives in the sketch ) won't fix them. some examples of failures worth mentioning:
summary of failures using htmlproofer with the following flags: ignore-status-codes \"301,302,401,403,429\" --checks='Links,Images,Scripts,Html' --no-check-external-hash --no-check-internal-hash --no-enforce-https
pausing work on this until group discussion on how we want to proceed.
let's chat about this at sync. looks like you found some good flags to use. however, i do believe we should be tracking 4xx series as errors since that means they're not publicly accessible.
htmlproofer offers a --only-4xx flag
here's the errors for the 4 static sites. this is the raw data from the terminal so if it's best i format them let me know. I used these flags for the runs --checks='Links,Images,Scripts,Html'
--only-4xx
--no-enforce-https
--allow-missing-href
--ignore-urls '/localhost./'
. the error count for these will differ slightly from what i reported before because i'm using different flags. I think the ones i've chosen this time make sense but i'm okay with changing them to whatever we want.
In my prior role at NIST, we have had great success with using lychee to check links against a generated version of the site in CI. This workflow builds the site and runs linking checking on the generated sources. The workflow is setup to work with Hugo, but other static site generators can be easily configured.
I am considering setting up something like this for fedramp.gov and marketplace.fedramp.gov.
Thanks for the recommendation @david-waltermire! I also found that lychee has a github action as well, so even easier to road test than before: https://github.com/lycheeverse/lychee-action
This looks promising: https://github.com/marketplace/actions/check-links-with-linkcheck
User Story
In order to ensure the quality of our sites, datagovteam would like a reliable report on broken links.
Acceptance Criteria
Background
Datagov team uses a broken link checker currently for our static sites, but it's unreliable and consistently fails with false positives. The new link checker should, ideally, be configurable to ignore certain status codes, or a list of pages, and should produce a report that will be able to be "made green" in the near term so that a failing report can be made to fail the build. As it stands now the report is always failing, and not for valid reasons, so no triggers can be configured around its status.
Security Considerations (required)
Fixing old links will improve the quality of the site and the user experience, but will likely not address any security concerns related to any domains that have come into the possession of bad actors.
Sketch
Also related: