Closed timcosgrove closed 6 months ago
I voted this one a large.
https://playwright.dev/docs/test-reporters , will likely want to use multiple for showing results locally / CI / S3 reporting
here's the existing playwright workflow that runs per pr, has action artifact upload workflow: https://github.com/department-of-veterans-affairs/next-build/blob/main/.github/workflows/playwright.yml
Some questions about constraints that were answered in a scrum meeting...now saving the notes.
It's about time for an update on this issue...first one thought on the acceptance criteria:
We discussed using an already-built production site to scan, and I liked that idea since we'd be scanning exactly what end-users are seeing whereas building the site separately or using a lower environment where the output could differ from what happens in production. So, that's the process I am running with vs. building out the site before scanning, but that's not what the original AC says.
Next, I'll go into what I initially tried to get started on this issue. These are paraphrased notes, but I can get specific error messages if helpful.
p-retry
dynamic import. The code looks fine to me so I'm not sure why I got errors about that import. I switched to using cross-fetch
instead of the custom fetcher since this is only one request...speaking of which, I swear https://www.va.gov/sitemap.xml used to be multiple pages, but now it is only one. test
vs. Playwright's default test function being used. Once I switched to the regular test function and didn't include variables in the title, finally I could run tests against the sitemap data.And while I could go over those failures and try to fix them, I was also thinking about the person using the report this issue would generate. Looking at the default HTML reporter for Playwright didn't seem to help much. You can filter by test name, but the info I saw wasn't super helpful vs. creating a custom reporter based on what the accessibility specialists want to see. I'd prefer writing a custom reporter to make adoption easier rather than using the standard output.
So if Playwright and axe-core aren't what I'm suggesting using, what am I suggesting using? Enter pa11y-ci
: https://github.com/pa11y/pa11y-ci Medicare was the project/place I first saw this used. We tested for accessibility using Cypress and pa11y. Cypress for dymanic user interaction tests and pa11y for simple load the page and test.
All the code in the current a11y test can be boiled down to pa11y-ci --sitemap https://va.gov/sitemap.xml --config ./pa11y-config.json
which is super simple to parallelize with different sitemap and configuration files. No custom code needed.
The process would work like this:
pa11y-ci --sitemap=localhost:8001/sitemap-${MATRIX_NUMBER}.xml --config ./pa11y-${MATRIX_NUMBER}.json
to run on GitHub in parallel. The variable names and strategy can change but this allows for randomly testing some sections in mobile vs. desktop viewports as well as potentially testing different browsers. I think that is a better option than (26,000+ URLs 2 viewports 3 browsers) and all the time that would take. That's a fair bit of information for an update, but I think pa11y-ci
is the better way to go than @playwright/axe-core
since they both use Puppeteer under the hood, we aren't "writing plays" for Puppeteer, I think pa11y-ci might run faster without extra overheard, and pa11y-ci is engineered exactly for the use case this ticket describes.
Thoughts @tjheffner and @timcosgrove?
As far as using Playwright goes, this HTML reporter https://rodrigoodhin.gitlab.io/playwright-html/#/1.2.0/screenshots has a lot more information than the standard one. Potentially something to look at for modifying the report and maybe trying to group by accessibility violation types.
Just making note of this GHA sharding example so I can close a tab: https://playwright.dev/docs/test-sharding#github-actions-example I think this will automagically pick up each segmented test and run them in parallel. They even have a merge-reports command to tie things back together at the end.
I started using the sharding example in a workflow and it seems to be splitting up the test runs into segments correctly. However, I ran into an issue with the local packages and might be an esm/cjs thing.
Run yarn playwright test --project=a11y --shard=2/5
Error: Cannot find module '/home/runner/work/next-build/next-build/node_modules/proxy-fetcher/dist/index.js'. Please verify that the package.json has a valid "main" entry
at ../utils/getSitemapLocations.js:1
> 1 | import { getFetcher } from 'proxy-fetcher'
| ^
2 | import crossFetch from 'cross-fetch'
3 |
4 | // Given an .xml file, extracts every string inside a <loc> element.
at Object.<anonymous> (/home/runner/work/next-build/next-build/playwright/utils/getSitemapLocations.js:1:1)
at Object.<anonymous> (/home/runner/work/next-build/next-build/playwright/tests/a[11](https://github.com/department-of-veterans-affairs/next-build/actions/runs/8377392215/job/22939376548#step:6:12)y.spec.js:3:1)
So I looked up yarn link
https://classic.yarnpkg.com/lang/en/docs/cli/link/ and added that to the workflow:
- name: Link local packages
run: |
yarn link $GITHUB_WORKSPACE/packages/proxy-fetcher
However, when I try to link, I get this error:
Run yarn link $GITHUB_WORKSPACE/packages/proxy-fetcher
Usage Error: Invalid destination; Can't link the project to itself
I think the package.json
info for proxy-fetcher can be updated like this:
{
"dependencies": {
"proxy-fetcher": "workspace:*"
},
"workspaces": [
"packages/*"
]
}
But I will work around this for now by simply using cross-fetch so I can test how merging reports goes.
So the sharding example does create five different jobs if I put in five shards; however, the tests are only run on one shard currently. I'm not sure why this is on the Playwright side.
Looking at the broken links check workflow, I will attempt to use the matrix numbers as env vars in the test instead of looping through...actually, I think I can use matrix.shardIndex
in the test to indicate page segment.
the broken links workflow also has this portion, which may get around some of those errors you're seeing with the package
# This is necessary in order for fetch to work.
- name: Build proxy-fetcher dist
run: |
yarn tsc -b ./packages/proxy-fetcher/tsconfig.json
Thanks @tjheffner , that seems to have done the trick with importing proxy-fetcher.
I did get tests to run in parallel by using the shard index and removing the loop. The merge report isn't working so I will look into debugging that now.
Merging the test run output is going okay. I fiddled with a custom test reporter for a little while but I didn't get much of anything going and TypeScript got the best of me...like saying I was exporting multiple classes where there was only one explicit import. Spooky.
I ended up just writing the violation results to a file at the end of the test vs. dealing with test info, attachments, and the way that would be done in a custom reporter. We'll see how long it takes to scan with 32 runners and what the violation output looks like.
Time for an update on this ticket...So I did get runs going with up to 64 different runners. One issue I saw was the last runner erroring with something like Error: frame.evaluate: Execution context was destroyed, most likely because of a navigation.
This error can be seen here: https://github.com/department-of-veterans-affairs/next-build/actions/runs/8470932551/job/23209834488#step:11:1911
I also saw with the last runner it running tests multiple times...which now made me think of retries and what if I removed them? So that is running now and might fix it. It would fit with why I saw URLs being run multiple times instead of once...maybe this will fix it. We'll see.
I added testing of multiple viewports and browsers via env vars that randomly get selected per runner. I think the default way Playwright suggests to test a matrix of different configurations is projects: https://playwright.dev/docs/test-projects . I would have named them Groups
myself since I had a different assumption of what the projects were....but anywhoo, the configuration is logged before the test run
Using Projects to run the a11y scan would run all the tests using three different browsers and two or three different viewport sizes. 26,000+ pages 3 browsers 3 viewports = 234000+ total analyze calls. Since the runs take between 15 - 30 minutes with 64 runners, doing that will all pages across configs could be 270 minutes...if my math works. So, that's why I opted to randomize each runner's configuration vs. using the Projects configuration or GitHub Workflow matrix running of different options.
I also had to finesse the merging of JSON reports, and I learned some shell/bash to accomplish this in a nice, succinct manner. The final report is a JSON file that lists any URL with any violations found. That could be processed into something that would be easier to filter using HTML/JS, but that would be in a different ticket, and I think should be done with the accessibility specialist.
I'm still getting the execution context error but https://github.com/microsoft/playwright/issues/27406#issuecomment-1745273458 points to expecting something to be visible and then continuing. I'm thinking the page context goes away before the .analyze()
function call runs. So, maybe I can expect a <body>
element to exist or something before analyzing...not sure why this only ever happens on the last runner, though.
Okay, hopefully, this is the final update for this issue. The issue with scans failing originated from pages redirecting client-side after loading JS. At times, the page loads and is scanned with the "loading spinner" rather than scanning the whole page. However, if the JS fully loads before the scan and redirects, Playwright doesn't follow that redirect.
Ideally, any redirects due to unauthenticated traffic should happen at the NGINX layer, which Playwright would follow. It's also a better UX than seeing a spinner and being redirected. I'm betting team fragmentation and other priorities have prevented moving authentication to the server level...or I'm completely wrong about how the authentication flow works, but I'm assuming the React applets make an API call once loaded.
We also uncovered URLS that might not be appropriate for the sitemap since they are for authenticated routes...so those are two potential tickets to make: some kind of auth check before any assets are sent to the client, and a ticket to validate sitemap contents are valid.
To get past the page context being destroyed, I added a simple try/catch
where any failed pages are logged to a separate report. After viewing the report a few times, the pages are different and usually way less than I had in a excludedPages
array. So, likely the failed pages will be different each time vs. a set list of URLs.
I did try to convert it to TypeScript but I got this error:
TS2459: Module '../utils/getSitemapLocations' declares getSitemapLocations locally, but it is not exported.
It's the esm vs. cjs issue again, I think, since getSitemapLocations
is exported in module.exports = { getSitemapLocations, splitPagesIntoBatches, getPagesSlice }
but that is cjs syntax whereas the Plawright test uses esm...so I will keep it .js
.
Just had to add a check for when no errors are found, but there was an error on the last runner I didn't see before: https://github.com/department-of-veterans-affairs/next-build/actions/runs/8559091874/job/23455130636#step:10:207
testing page: http://www.va.gov/disability/view-disability-rating/rating/
error: page.evaluate: Execution context was destroyed, most likely because of a navigation
testing page: http://www.va.gov/my-va/
error: page.goto: Test ended.
Call log:
- navigating to "http://www.va.gov/my-va/", waiting until "load"
testing page: http://www.va.gov/profile/
error: page.goto: Target page, context or browser has been closed
testing page: http://www.va.gov/view-change-dependents/view/
error: page.goto: Target page, context or browser has been closed
We'll see if it happens again.
Well, https://www.va.gov/my-va/ is definitely broken so I will add back the excluded pages check since some sitemap pages I guess can be totally broken instead of redirecting. Not sure if Playwright can bail and reload from these errors, but seems different than when the page redirects and context can be accessed still.
Update: Well now that page is not 500'ing...hmm. Well, this sucks. I will not add the exclude then since that route shouldn't be erroring. I guess the a11y scan can be a proxy for catching these errors in the sitemap...but I have no idea why that route didn't load and then my page refreshed without me reloading...who knows...
Requirements
We want to run a daily a11y on the output of Next Build, so that we can catch and fix any a11y violations.
Background & implementation details
The previous a11y scan run inside Content Build may be a useful place to start for pieces of this: https://github.com/department-of-veterans-affairs/content-build/blob/0131bb3b8f408d2801be9faf7ebadd855636fef5/.github/workflows/a11y.yml