Create a11y check workflow for Next Build output

timcosgrove commented 11 months ago

Requirements

We want to run a daily a11y on the output of Next Build, so that we can catch and fix any a11y violations.

### Acceptance criteria
- [x] The workflow scans the va.gov sitemap in parallel
- [x] The workflow uploads test reports to Github to reside with the workflow run (actions/upload-artifact)
- [ ] Optional: The workflow uploads test reports to VA S3 for ongoing storage as a browsable HTML page
- [ ] Optional: The workflow sends Slack notification for starting, finishing, and failure states

Background & implementation details

The previous a11y scan run inside Content Build may be a useful place to start for pieces of this: https://github.com/department-of-veterans-affairs/content-build/blob/0131bb3b8f408d2801be9faf7ebadd855636fef5/.github/workflows/a11y.yml

### Implementation tasks
- [ ] Use this if it helps you or feel free to delete.

olivereri commented 11 months ago

I voted this one a large.

tjheffner commented 10 months ago

https://playwright.dev/docs/test-reporters , will likely want to use multiple for showing results locally / CI / S3 reporting

here's the existing playwright workflow that runs per pr, has action artifact upload workflow: https://github.com/department-of-veterans-affairs/next-build/blob/main/.github/workflows/playwright.yml

alexfinnarn commented 7 months ago

Some questions about constraints that were answered in a scrum meeting...now saving the notes.

Is there any time limit the workflow needs to keep in mind? Keep it under an hour...use parallelization and look at the broken link report workflow.
Should sitemap or analytics be thought of for which pages to scan first? Using a sitemap to loop through for pages to check. No need to loop in analytics or "sitemod" from the sitemap.
For reporters and accessibility errors, use the HTML reporter since it will be easiest for accessibility specialists to review.
What about interactivity? Only test on load, no need to look for interactions since the main testing readme has notes for testing accessibility in components.
Screen sizes needed? look for desktop screen size and mobile from the styling code.
Cross-browser testing? Chrome, Firefox, and Safari should be tested.

alexfinnarn commented 7 months ago

It's about time for an update on this issue...first one thought on the acceptance criteria:

We discussed using an already-built production site to scan, and I liked that idea since we'd be scanning exactly what end-users are seeing whereas building the site separately or using a lower environment where the output could differ from what happens in production. So, that's the process I am running with vs. building out the site before scanning, but that's not what the original AC says.

Next, I'll go into what I initially tried to get started on this issue. These are paraphrased notes, but I can get specific error messages if helpful.

My initial attempts to run the Playwright tests failed so I reviewed the documentation and even re-cloned the repo from scratch to see if I was missing anything. Maybe the instructions for Playwright accessibility testing work for others, but also maybe it used to work but maybe enough changed that the instructions no longer work?
Specifically, I kept running into an issue with the p-retry dynamic import. The code looks fine to me so I'm not sure why I got errors about that import. I switched to using cross-fetch instead of the custom fetcher since this is only one request...speaking of which, I swear https://www.va.gov/sitemap.xml used to be multiple pages, but now it is only one.
After getting the sitemap data, I kept running into issues with async/await and the sitemap request not completing before looping through the list of URLs. So it kept saying no tests were found. If I moved the gathering of sitemap URLs into the test function, everything was fine but then it's not one test per page...Also, I know with Cypress a lot of things are reset for each test so it can be a better idea to check out several related things in one test vs. isolation and overhead. For just scanning a page after it loads, I would lean towards less isolation to speed up the run and I don't think that is risky.
Even if I could get one test per URL, I kept getting a "can't use ${title} in test" and something about the configuration file. I never found anywhere where specifically a variable called "title" was used but I did see how the test name is dynamically generated. I also saw the extended, custom test vs. Playwright's default test function being used. Once I switched to the regular test function and didn't include variables in the title, finally I could run tests against the sitemap data.

And while I could go over those failures and try to fix them, I was also thinking about the person using the report this issue would generate. Looking at the default HTML reporter for Playwright didn't seem to help much. You can filter by test name, but the info I saw wasn't super helpful vs. creating a custom reporter based on what the accessibility specialists want to see. I'd prefer writing a custom reporter to make adoption easier rather than using the standard output.

So if Playwright and axe-core aren't what I'm suggesting using, what am I suggesting using? Enter pa11y-ci: https://github.com/pa11y/pa11y-ci Medicare was the project/place I first saw this used. We tested for accessibility using Cypress and pa11y. Cypress for dymanic user interaction tests and pa11y for simple load the page and test.

All the code in the current a11y test can be boiled down to pa11y-ci --sitemap https://va.gov/sitemap.xml --config ./pa11y-config.json which is super simple to parallelize with different sitemap and configuration files. No custom code needed.

The process would work like this:

A script is used to analyze the sitemap. I created a simple one locally just to see the distribution of pages and am fiddling with it. This can be useful to determine how to split up the sitemap data for parallelization and reporting. For instance, maybe people want results from certain sections grouped together.
A script is used to generate a number of sitemap files. It could simply split the 26,000+ URLs into equal groups or group them in a more semantic manner. We also could exclude/limit pages where it's "page 20 of 500" if checking all listing pages gets redundant...to note, I see things in the sitemap for events where no events are listed on va.gov...it is just a blank event listing. So maybe the sitemap generation should be looked into as well.
Use pa11y-ci --sitemap=localhost:8001/sitemap-${MATRIX_NUMBER}.xml --config ./pa11y-${MATRIX_NUMBER}.json to run on GitHub in parallel. The variable names and strategy can change but this allows for randomly testing some sections in mobile vs. desktop viewports as well as potentially testing different browsers. I think that is a better option than (26,000+ URLs 2 viewports 3 browsers) and all the time that would take.
Upload the report in whatever format is easiest for the reviewer. I'm thinking a CSV would be most useful as that can be marked up, manipulated, or whatever as accessibility issues are logged and fixed. Viewing an HTML file might be nice, but you can't edit it as easily. pa11y does have an HTML reporter they moved into the main project https://github.com/pa11y/pa11y-reporter-html

That's a fair bit of information for an update, but I think pa11y-ci is the better way to go than @playwright/axe-core since they both use Puppeteer under the hood, we aren't "writing plays" for Puppeteer, I think pa11y-ci might run faster without extra overheard, and pa11y-ci is engineered exactly for the use case this ticket describes.

Thoughts @tjheffner and @timcosgrove?

alexfinnarn commented 7 months ago

As far as using Playwright goes, this HTML reporter https://rodrigoodhin.gitlab.io/playwright-html/#/1.2.0/screenshots has a lot more information than the standard one. Potentially something to look at for modifying the report and maybe trying to group by accessibility violation types.

alexfinnarn commented 7 months ago

Just making note of this GHA sharding example so I can close a tab: https://playwright.dev/docs/test-sharding#github-actions-example I think this will automagically pick up each segmented test and run them in parallel. They even have a merge-reports command to tie things back together at the end.

alexfinnarn commented 7 months ago

I started using the sharding example in a workflow and it seems to be splitting up the test runs into segments correctly. However, I ran into an issue with the local packages and might be an esm/cjs thing.

Run yarn playwright test --project=a11y --shard=2/5

Error: Cannot find module '/home/runner/work/next-build/next-build/node_modules/proxy-fetcher/dist/index.js'. Please verify that the package.json has a valid "main" entry

   at ../utils/getSitemapLocations.js:1

> 1 | import { getFetcher } from 'proxy-fetcher'
    | ^
  2 | import crossFetch from 'cross-fetch'
  3 |
  4 | // Given an .xml file, extracts every string inside a <loc> element.

    at Object.<anonymous> (/home/runner/work/next-build/next-build/playwright/utils/getSitemapLocations.js:1:1)
    at Object.<anonymous> (/home/runner/work/next-build/next-build/playwright/tests/a[11](https://github.com/department-of-veterans-affairs/next-build/actions/runs/8377392215/job/22939376548#step:6:12)y.spec.js:3:1)

So I looked up yarn link https://classic.yarnpkg.com/lang/en/docs/cli/link/ and added that to the workflow:

- name: Link local packages
  run: |
        yarn link $GITHUB_WORKSPACE/packages/proxy-fetcher

However, when I try to link, I get this error:

Run yarn link $GITHUB_WORKSPACE/packages/proxy-fetcher
Usage Error: Invalid destination; Can't link the project to itself

I think the package.json info for proxy-fetcher can be updated like this:

{
  "dependencies": {
    "proxy-fetcher": "workspace:*"
  },
  "workspaces": [
    "packages/*"
  ]
}

But I will work around this for now by simply using cross-fetch so I can test how merging reports goes.

alexfinnarn commented 7 months ago

So the sharding example does create five different jobs if I put in five shards; however, the tests are only run on one shard currently. I'm not sure why this is on the Playwright side.

Looking at the broken links check workflow, I will attempt to use the matrix numbers as env vars in the test instead of looping through...actually, I think I can use matrix.shardIndex in the test to indicate page segment.

tjheffner commented 7 months ago

the broken links workflow also has this portion, which may get around some of those errors you're seeing with the package

      # This is necessary in order for fetch to work.
      - name: Build proxy-fetcher dist
        run: |
          yarn tsc -b ./packages/proxy-fetcher/tsconfig.json

alexfinnarn commented 7 months ago

Thanks @tjheffner , that seems to have done the trick with importing proxy-fetcher.

I did get tests to run in parallel by using the shard index and removing the loop. The merge report isn't working so I will look into debugging that now.

alexfinnarn commented 7 months ago

Merging the test run output is going okay. I fiddled with a custom test reporter for a little while but I didn't get much of anything going and TypeScript got the best of me...like saying I was exporting multiple classes where there was only one explicit import. Spooky.

I ended up just writing the violation results to a file at the end of the test vs. dealing with test info, attachments, and the way that would be done in a custom reporter. We'll see how long it takes to scan with 32 runners and what the violation output looks like.

alexfinnarn commented 6 months ago

Time for an update on this ticket...So I did get runs going with up to 64 different runners. One issue I saw was the last runner erroring with something like Error: frame.evaluate: Execution context was destroyed, most likely because of a navigation. This error can be seen here: https://github.com/department-of-veterans-affairs/next-build/actions/runs/8470932551/job/23209834488#step:11:1911

I also saw with the last runner it running tests multiple times...which now made me think of retries and what if I removed them? So that is running now and might fix it. It would fit with why I saw URLs being run multiple times instead of once...maybe this will fix it. We'll see.

I added testing of multiple viewports and browsers via env vars that randomly get selected per runner. I think the default way Playwright suggests to test a matrix of different configurations is projects: https://playwright.dev/docs/test-projects . I would have named them Groups myself since I had a different assumption of what the projects were....but anywhoo, the configuration is logged before the test run

Using Projects to run the a11y scan would run all the tests using three different browsers and two or three different viewport sizes. 26,000+ pages 3 browsers 3 viewports = 234000+ total analyze calls. Since the runs take between 15 - 30 minutes with 64 runners, doing that will all pages across configs could be 270 minutes...if my math works. So, that's why I opted to randomize each runner's configuration vs. using the Projects configuration or GitHub Workflow matrix running of different options.

I also had to finesse the merging of JSON reports, and I learned some shell/bash to accomplish this in a nice, succinct manner. The final report is a JSON file that lists any URL with any violations found. That could be processed into something that would be easier to filter using HTML/JS, but that would be in a different ticket, and I think should be done with the accessibility specialist.

alexfinnarn commented 6 months ago

I'm still getting the execution context error but https://github.com/microsoft/playwright/issues/27406#issuecomment-1745273458 points to expecting something to be visible and then continuing. I'm thinking the page context goes away before the .analyze() function call runs. So, maybe I can expect a <body> element to exist or something before analyzing...not sure why this only ever happens on the last runner, though.

alexfinnarn commented 6 months ago

Okay, hopefully, this is the final update for this issue. The issue with scans failing originated from pages redirecting client-side after loading JS. At times, the page loads and is scanned with the "loading spinner" rather than scanning the whole page. However, if the JS fully loads before the scan and redirects, Playwright doesn't follow that redirect.

Ideally, any redirects due to unauthenticated traffic should happen at the NGINX layer, which Playwright would follow. It's also a better UX than seeing a spinner and being redirected. I'm betting team fragmentation and other priorities have prevented moving authentication to the server level...or I'm completely wrong about how the authentication flow works, but I'm assuming the React applets make an API call once loaded.

We also uncovered URLS that might not be appropriate for the sitemap since they are for authenticated routes...so those are two potential tickets to make: some kind of auth check before any assets are sent to the client, and a ticket to validate sitemap contents are valid.

To get past the page context being destroyed, I added a simple try/catch where any failed pages are logged to a separate report. After viewing the report a few times, the pages are different and usually way less than I had in a excludedPages array. So, likely the failed pages will be different each time vs. a set list of URLs.

I did try to convert it to TypeScript but I got this error:

TS2459: Module '../utils/getSitemapLocations' declares getSitemapLocations locally, but it is not exported.

It's the esm vs. cjs issue again, I think, since getSitemapLocations is exported in module.exports = { getSitemapLocations, splitPagesIntoBatches, getPagesSlice } but that is cjs syntax whereas the Plawright test uses esm...so I will keep it .js.

alexfinnarn commented 6 months ago

Just had to add a check for when no errors are found, but there was an error on the last runner I didn't see before: https://github.com/department-of-veterans-affairs/next-build/actions/runs/8559091874/job/23455130636#step:10:207

testing page: http://www.va.gov/disability/view-disability-rating/rating/
error: page.evaluate: Execution context was destroyed, most likely because of a navigation
testing page: http://www.va.gov/my-va/
error: page.goto: Test ended.
Call log:
  - navigating to "http://www.va.gov/my-va/", waiting until "load"

testing page: http://www.va.gov/profile/
error: page.goto: Target page, context or browser has been closed
testing page: http://www.va.gov/view-change-dependents/view/
error: page.goto: Target page, context or browser has been closed

We'll see if it happens again.

alexfinnarn commented 6 months ago

Well, https://www.va.gov/my-va/ is definitely broken so I will add back the excluded pages check since some sitemap pages I guess can be totally broken instead of redirecting. Not sure if Playwright can bail and reload from these errors, but seems different than when the page redirects and context can be accessed still.

Update: Well now that page is not 500'ing...hmm. Well, this sucks. I will not add the exclude then since that route shouldn't be erroring. I guess the a11y scan can be a proxy for catching these errors in the sitemap...but I have no idea why that route didn't load and then my page refreshed without me reloading...who knows...

department-of-veterans-affairs / va.gov-cms

Create a11y check workflow for Next Build output #16117

Requirements

Background & implementation details