cisagov / ScubaGear

Automation to assess the state of your M365 tenant against CISA's baselines
https://www.cisa.gov/resources-tools/services/secure-cloud-business-applications-scuba-project
Creative Commons Zero v1.0 Universal
1.65k stars 221 forks source link

Concurrent runs of conflicting test jobs may fail when running nightly functional tests workflow #1278

Closed schrolla closed 5 days ago

schrolla commented 2 months ago

🐛 Summary

The current nightly functional testing workflow uses a matrix to build a set of concurrent functional test run jobs across products/variants/test tenants. However, if the parameters include test plans for the same product against the same test tenant and they are run in concurrently, then the tests may fail incorrectly as both jobs running at the same time make conflicting changes to the tenant configuration. The result is incorrect test results (failures) on one or more of the jobs.

To reproduce

Steps to reproduce the behavior:

  1. Add a test parameter entry for test tenant 3 for the standard variant, and another for the G5 variant to the nightly build parameters.
  2. Run the Nightly Functional Tests action workflow and wait for results. Inspect results for possible failed jobs.
  3. If failures in the same tenant are found, re-run each individually and inspect results again. Expectation is that the individually run jobs now succeed.

Expected behavior

Test jobs with a potential to conflict (same product/test tenant) will not be run concurrently.

Any helpful log output or screenshots

Example showing a failed test run in both test tenant 3 and 6 when standard and G5 variant test jobs are run in the same tenant concurrently. Screenshot 2024-08-19 at 3 53 19 PM

schrolla commented 2 months ago

Short term work around may include changing the nightly functional test parameters such that conflicting testplans are not run. Long term, the workflow would benefit from more selective concurrent execution.

gdasher commented 2 months ago

Relevant: https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/control-the-concurrency-of-workflows-and-jobs

schrolla commented 2 months ago

Relevant: https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/control-the-concurrency-of-workflows-and-jobs

We've started looking down this path, but the current matrix setup (while generally it supports this type of concurrency) likely requires some serious restructuring to be able to define the right keys at the right level to actually prevent concurrent runs against the same tenant. TBD