Open gbinal opened 2 months ago
The required links scan has been rebuilt to use a puppeteer Page instance and DOM queries instead of regex searching the response body as raw text.
Testing indicates that the change works for the two cases above. It's deployed in this PR and data will be available if we create a new snapshot after tonight's scans run: https://github.com/GSA/site-scanning-engine/pull/325
This is better but still is occuring some.
Examples:
bea.gov: most recent snapshot has no data for the required links fields, but local scans turn up the following:
"requiredLinksScan":
{
"requiredLinksUrl": "about,fear,foia,privacy,usa.gov",
"requiredLinksText": "budget and performance,no fear act,foia,usa.gov"
}
developers.login.gov: includes the data below in local scans and the most recent snapshot
required_links_url
: fear,foia,usa.gov | required_links_text
: accessibility,no fear act,foia,inspector general,privacy policy,usa.govdeeoic.dol.gov: required_links_text
field includes "accessibility" in local scans and the most recent snapshot
calm.gsa.gov: try loading this page with Chrome devtools open: the "About link" isn't there in the 200 response body. It may be added by client-side scripting after puppeteer has evaluated the page (see below)
The most recent prod scans for bea.gov
get a HTTP 403 Forbidden response, which is likely why the required links aren't showing up as expected versus when pulling that site up in a browser manually.
fair enough - thank you!!
You've researched every example I have found so far. I need to update our documentation to reflect these lessons learned but also will see if I can find any more to try to test, but as best as I can tell, in every case, it's not been on our end.
Note current in-use snippets here...
E.g. on: