department-of-veterans-affairs / va.gov-team

Public resources for building on and in support of VA.gov. Visit complete Knowledge Hub:
https://depo-platform-documentation.scrollhelp.site/index.html
283 stars 204 forks source link

Create a proof of concept of a GHA workflow to test new/changed Cypress tests for flakiness #46339

Open JoeTice opened 2 years ago

JoeTice commented 2 years ago

Issue Description

Given the information researched in Explore a GHA workflow to vet new/changed Cypress tests for flakiness, create a proof of concept of the proposed solution.

Tasks

Acceptance Criteria

holdenhinkle commented 2 years ago

PR - https://github.com/department-of-veterans-affairs/vets-website/pull/22109

We created a new Big Query table called vets_website_e2e_allow_list and populated it with all Cypress .spec files: image.png

We created a new vets-website Github Workflow called .github/workflows/e2e-stress-test.yml that currently:

If we pass an optional env variable - SHOULD_STRESS_TEST=TRUE - into run-cypress-tests.js it runs the tests n times. We updated the timeouts in GitHub Actions to 20 hours and set the job to run the Cypress tests 40 times. We're testing this here - https://github.com/department-of-veterans-affairs/vets-website/actions/runs/3002697203

Todo:

CBonade commented 2 years ago

Yesterday, we discovered a bug with the work we had done the day prior. Our ternary operator we were using to determine our number of runs to run in the loop was actually not working. We didn't discover this error prior, as none of the runs finished due to timeouts. The reality of it, was it was running in an infinite loop. We figured out a way to work around this and have corrected the issue.

Additionally yesterday we moved on to starting the functionality for determining what tests have passed and which ones have failed in preparation to update their enabled/disabled status in BigQuery. We created a new script in the dashboard data repo, and using the artifacts generated from the vets-website run (whether it's CI OR the new workflow), we were able to make one combined results JSON file. We then worked at a solution to parse these results, however we spent a good bit of time stuck at missing 12 out of the 304 tests that were not accounted for in our parsed results. Later, Holden was able to figure out the hiccup and we ended up with a much more reliable solution.

Our to-do list looks the same as the day prior's update, as the items we ended up addressing yesterday were unforeseen.

holdenhinkle commented 2 years ago

Curt and I have working, sans notifications.

On Monday we plan on tackling some/all of the remaining items:

Then there's the idea of listing flaky tests on a page in the platform docs; list the spec name, the titles of the tests that failed in the spec, and how long it has been disabled by the Allow List.

holdenhinkle commented 2 years ago

CI run - https://github.com/department-of-veterans-affairs/vets-website/actions/runs/3025081596

image.png

:-)

holdenhinkle commented 2 years ago

We'd like to add a field to the product directory regarding flaky tests. Talk to Joe and Peter about it.

holdenhinkle commented 2 years ago

One outstanding question: what to do if the long for test (the test that takes significantly longer than the rest) needs to get stress-tested in CI? It could take a very long time for that to run. If that test is detected, we run it in multiple cypress instances. Currently, we're run tests that need to be stress-tested in 1 cypress instances.

pjhill commented 2 years ago

TBD --

  1. How many times should a test be executed to vet it?
  2. What do we do about particularly long tests? Should we have multiple runners executing the runs? 50 / 10 runners means that each runner only needs to run the test 5 times, so they will finish more quickly.
  3. How can we track changes to the disallow list?
    • When was this test spec added to the disallow list?
pjhill commented 2 years ago
holdenhinkle commented 2 years ago

I created a support request with operations re the VA_VSP_BOT_GITHUB_TOKEN token - https://dsva.slack.com/archives/CBU0KDSB1/p1663599892158159

holdenhinkle commented 2 years ago

RFC - https://vfs.atlassian.net/wiki/spaces/TTT/pages/2370797631/RFC+E2E+Stress+Test+and+Allow+List

holdenhinkle commented 2 years ago

va.gov-team issues are created now

holdenhinkle commented 2 years ago

We'll do manual testing on this feature tomorrow.

Notes from Peter:

Peter Hill 11:25 AM Made test case stubs for the following scenarios -- VFS Team Creates an E2E Test Spec for a New Product VFS Team Adds an E2E Test Spec to an Existing Product VFS Team Adds a Test to an Existing Test Spec VFS Team Removes a Test from an Existing Test Spec VFS Team Removes a Test Spec for an Existing Product VFS Team Changes a Test in an Existing Spec

pjhill commented 2 years ago

Test Flakiness test project is here.

pjhill commented 2 years ago

To do

holdenhinkle commented 2 years ago

I've rerun the E2E Stress Test workflow multiple times (looping tests 2x) and haven't seen that BigQuery error again - https://github.com/department-of-veterans-affairs/vets-website/actions/runs/3091114869

Here's the PR that 'fixed' it - https://github.com/department-of-veterans-affairs/qa-standards-dashboard-data/pull/177

The reason for the failures we were seeing: Passing an async function as a callback to #forEach in JavaScript doesn't work as expected.

holdenhinkle commented 2 years ago

List of manual testing cases - https://docs.google.com/spreadsheets/d/1unNnzRcbY1AkMAZLuiO46KeiIJdBngGO-Vdcb9z0ZMU/edit#gid=0

holdenhinkle commented 2 years ago

Just pushed a commit to log everything for manual testing - https://github.com/department-of-veterans-affairs/vets-website/pull/22109/commits/fdbaa48dbdc0781fe1dc3ee00800979842c91959

Run - https://github.com/department-of-veterans-affairs/vets-website/actions/runs/3093299194

holdenhinkle commented 2 years ago

Manual testing is complete. Phew!

holdenhinkle commented 2 years ago

PR submitted - https://github.com/department-of-veterans-affairs/vets-website/pull/22109