JoeTice commented 2 years ago

Issue Description

Given the information researched in Explore a GHA workflow to vet new/changed Cypress tests for flakiness, create a proof of concept of the proposed solution.

Tasks

[x] Use the proposed solution in this ticket to create a proof of concept
[x] Test the proof of concept for desired behaviors
[x] Report results of POC
[x] Discuss the results and POC with the team
[x] Create script that will run our test suite X times and accept an optional param of a single spec to run.
[x] Create a BQ table to store our spec list with disallow flags
[x] Create a workflow to run the full suite using this script on a schedule, adjust workflow timeout to allow for it to run as necessary
[x] Update BQ table every time this script runs with the appropriate disallow flags
[x] Alter test selection to check against the disallow list
[x] Create differ to check PRs for changes to banned specs, trigger re-check on that spec only using existing script
[x] Create mechanism to notify teams that a test has been quarantined and needs fixing.
[x] Perform manual acceptance testing

Acceptance Criteria

[x] Proof of concept E2E flaky test detection and isolation is available for demo and discussion with stakeholders

holdenhinkle commented 2 years ago

PR - https://github.com/department-of-veterans-affairs/vets-website/pull/22109

We created a new Big Query table called vets_website_e2e_allow_list and populated it with all Cypress .spec files:

We created a new vets-website Github Workflow called .github/workflows/e2e-stress-test.yml that currently:

Grabs the e2e allow list from BigQuery
Passes the allow list into Test Selection
Filters out any tests on the Allow List where allow=FALSE

If we pass an optional env variable - SHOULD_STRESS_TEST=TRUE - into run-cypress-tests.js it runs the tests n times. We updated the timeouts in GitHub Actions to 20 hours and set the job to run the Cypress tests 40 times. We're testing this here - https://github.com/department-of-veterans-affairs/vets-website/actions/runs/3002697203

Todo:

Update allow list when tests fail
Create a check in the CI workflow that runs new/updated Cypress tests through the new Stress Test workflow. If the check fails, the new/updated tests cannot be merged. If the check passes, the new/updated tests can be merged and the tests are removed from the E2E Allow List (BigQuery Table).
In a given PR, run the allowed tests 1x and run the new/updated tests through the Stress Test workflow.
For each new E2E failure, create a GitHub Issue ticket with the GitHub label that belongs to the team that owns the failing test, and label it something like `flaky_e2e_test'.
Consider additional notification options, such as posting failure data to #vfs-all-teams.

CBonade commented 2 years ago

Yesterday, we discovered a bug with the work we had done the day prior. Our ternary operator we were using to determine our number of runs to run in the loop was actually not working. We didn't discover this error prior, as none of the runs finished due to timeouts. The reality of it, was it was running in an infinite loop. We figured out a way to work around this and have corrected the issue.

Additionally yesterday we moved on to starting the functionality for determining what tests have passed and which ones have failed in preparation to update their enabled/disabled status in BigQuery. We created a new script in the dashboard data repo, and using the artifacts generated from the vets-website run (whether it's CI OR the new workflow), we were able to make one combined results JSON file. We then worked at a solution to parse these results, however we spent a good bit of time stuck at missing 12 out of the 304 tests that were not accounted for in our parsed results. Later, Holden was able to figure out the hiccup and we ended up with a much more reliable solution.

Our to-do list looks the same as the day prior's update, as the items we ended up addressing yesterday were unforeseen.

holdenhinkle commented 2 years ago

Curt and I have working, sans notifications.

On Monday we plan on tackling some/all of the remaining items:

Failure notifications
Printing a summary of the tests that are skipped due to the E2E Allow List because it might not be obvious to engineers why tests that aren't skipped don't run
Refactor/clean up

Then there's the idea of listing flaky tests on a page in the platform docs; list the spec name, the titles of the tests that failed in the spec, and how long it has been disabled by the Allow List.

holdenhinkle commented 2 years ago

CI run - https://github.com/department-of-veterans-affairs/vets-website/actions/runs/3025081596

:-)

holdenhinkle commented 2 years ago

We'd like to add a field to the product directory regarding flaky tests. Talk to Joe and Peter about it.

holdenhinkle commented 2 years ago

One outstanding question: what to do if the long for test (the test that takes significantly longer than the rest) needs to get stress-tested in CI? It could take a very long time for that to run. If that test is detected, we run it in multiple cypress instances. Currently, we're run tests that need to be stress-tested in 1 cypress instances.

pjhill commented 2 years ago

TBD --

How many times should a test be executed to vet it?
What do we do about particularly long tests? Should we have multiple runners executing the runs? 50 / 10 runners means that each runner only needs to run the test 5 times, so they will finish more quickly.
How can we track changes to the disallow list?
- When was this test spec added to the disallow list?

pjhill commented 2 years ago

GH creds need to have permission to write to va.gov-team added

holdenhinkle commented 2 years ago

I created a support request with operations re the VA_VSP_BOT_GITHUB_TOKEN token - https://dsva.slack.com/archives/CBU0KDSB1/p1663599892158159

holdenhinkle commented 2 years ago

RFC - https://vfs.atlassian.net/wiki/spaces/TTT/pages/2370797631/RFC+E2E+Stress+Test+and+Allow+List

holdenhinkle commented 2 years ago

va.gov-team issues are created now

holdenhinkle commented 2 years ago

We'll do manual testing on this feature tomorrow.

Notes from Peter:

Peter Hill 11:25 AM Made test case stubs for the following scenarios -- VFS Team Creates an E2E Test Spec for a New Product VFS Team Adds an E2E Test Spec to an Existing Product VFS Team Adds a Test to an Existing Test Spec VFS Team Removes a Test from an Existing Test Spec VFS Team Removes a Test Spec for an Existing Product VFS Team Changes a Test in an Existing Spec

pjhill commented 2 years ago

Test Flakiness test project is here.

pjhill commented 2 years ago

To do

[ ] Automatically generate a summary of disallowed tests for publication to Platform Website
[x] Manual testing

holdenhinkle commented 2 years ago

I've rerun the E2E Stress Test workflow multiple times (looping tests 2x) and haven't seen that BigQuery error again - https://github.com/department-of-veterans-affairs/vets-website/actions/runs/3091114869

Here's the PR that 'fixed' it - https://github.com/department-of-veterans-affairs/qa-standards-dashboard-data/pull/177

The reason for the failures we were seeing: Passing an async function as a callback to #forEach in JavaScript doesn't work as expected.

holdenhinkle commented 2 years ago

List of manual testing cases - https://docs.google.com/spreadsheets/d/1unNnzRcbY1AkMAZLuiO46KeiIJdBngGO-Vdcb9z0ZMU/edit#gid=0

holdenhinkle commented 2 years ago

Just pushed a commit to log everything for manual testing - https://github.com/department-of-veterans-affairs/vets-website/pull/22109/commits/fdbaa48dbdc0781fe1dc3ee00800979842c91959

Run - https://github.com/department-of-veterans-affairs/vets-website/actions/runs/3093299194

holdenhinkle commented 2 years ago

Manual testing is complete. Phew!

holdenhinkle commented 2 years ago

PR submitted - https://github.com/department-of-veterans-affairs/vets-website/pull/22109

department-of-veterans-affairs / va.gov-team

Create a proof of concept of a GHA workflow to test new/changed Cypress tests for flakiness #46339

Issue Description

Tasks

Acceptance Criteria

To do