This PR implements new test check workflows, incorporates Composite Actions for efficient workflow coding, and sets up caching for Playwright binaries, extending the existing caching mechanism for Puppeteer.
Key Updates:
New test check workflows for schema validation and expected output, enhancing PR change detection.
Integration of Composite Actions for reusable workflow segments.
Playwright bitmap references added for smoke tests, and inclusion of test/__fixtures__ for report.json snapshots.
Workflow runs currently triggered by workflow_dispatch, with potential for pull_request activation upon code modification.
Potentially closes #1533.
Details
Composite Actions
Located in .github/actions/[action]/action.yml, these actions simplify workflow setup, execution, and validation. Future plans include converting existing workflows to utilize these actions.
Workflow Testing
The workflows focus on the backstop test command execution and the validation of report.json against predefined fixtures. Tests catch overlooked sections and detect changes in report property names.
General Workflow Steps
Execute an npm test.
Compare report.json with corresponding fixture (./test/__fixtures__/[npm-script-name].json).
Pre-filter report.json properties for shape consistency.
Summarize results with a Pass/Fail determination.
Smoke and Integration Test Specifics
Smoke Tests: Address discrepancies between local and GitHub runs by filtering report.json before comparison.
Integration Tests: Focus on the final report.json generated by backstop test, using a Bash script to select the latest report.
Workflow Files
integration-test-check.yml: Runs integration tests and assesses report.json.
sanity-test-checks.yml: Executes and compares sanity-test and sanity-test-playwright.
smoke-test-checks.yml: Handles both smoke-test and smoke-test-playwright.
Docker-related workflows follow a similar structure for sanity and smoke tests.
Playwright Binaries Caching
Improves efficiency by caching Playwright installations on GitHub Actions, using OS and version for cache identification.
Conclusion
These updates aim to streamline testing processes and improve reliability. Feedback on the inclusion or modification of these features is welcome.
Cheers!
Notes and Further Details
Composite Actions are repeatable pieces of code that can take inputs and produce outputs. They live in .github/actions/[action]/action.yml, and are a great way to keep setup, execution, and validation patterns DRY.
I haven't changed any existing workflows to use composite actions yet, but am happy to do so.
New workflows are set to run only by workflow_dispatch for now. They can be enabled for pull_request if desired, but need a code change. Let me know if interested and I'll add a commit :)
What are they Testing?
Perhaps a portion of the backstop test command was accidentally commented out. For example:
Running npm run sanity-test does not catch this in command output:
COMMAND | Executing core for "test"
COMMAND | Resolved already:test
COMMAND | Command "test" successfully executed in [0.001s]
However, by expecting a test/configs/backstop_data/bitmaps_test/[TIMESTAMP]/report.json we can catch a failed test run by diff (explained in detail later):
Another example, maybe someone changes a report property name:
Both are forced examples, but provide a glimpse into what's possible.
Smoke Test Caveat
I've seen a few smoke tests pass on GitHub but fail locally. For now, test comparisons first filter the report.json objects, deleting properties we know will have different shapes (or not exist at all in a pass):
jq 'walk(if type == "object" then with_entries(.value |= if type == "object" or type == "array" then . else "" end) else . end) | \
\
del(.tests[].pair.diff, .tests[].pair.diffImage)' \
\
test/__fixtures__/smoke-test.json
Line breaks added for readability
diffImage doesn't exist on passing tests, so it's removed before analyzing report.json . As you previously mentioned, smoke tests are somewhat unreliable. "misMatchThreshold" : 0.1, could also be bumped a bit, to be more forgiving.
We can take a look at polishing smoke tests at some point, but this gets the job done! Below is a snapshot of the failing diff before filtering with jq.
Integration Caveat
The integration-test script generates two reports. One when running backstop reference, and the other after backstop test. We apply a fancy bash one-liner to find the most recently modified directory, and only diff the final report.
report.json Filtration Details
First and foremost, during a test check, .tests[.pair] object values are set to empty strings. Some values will never be 1:1, due to system runtime differences, browser changes over time, etc. Data shape only is being tested in these new workflows.
jq is used to traverse the report.json object and set non array or object property values to empty strings: "", which is also applied to nested properties within any aforementioned object or array.
This affords a way to test the general "shape" of data we expect backstop test to produce, comparing it with corresponding JSON files in test/__fixtures/.
That ends up looking like this, which is the shape tested in integration and sanity "check" workflows introduced in this PR:
This runs npm run integration-test, then tests the resultant report.json , which is the last step in the project's integration test: backstop test.
The GitHub workflow results in a pass/fail based on shape alone. The unfiltered A/B fixture/CI diff is included in the workflow's summary for further analysis.
![NOTE]
All workflow summaries contain the unfiltered diff, under the "Unfiltered Diff" heading in workflow summary. There will always be timestamp directory-name differences in the "test" property, which further illustrates why property/value filtering is needed.
sanity-test-checks.yml
Runs both sanity-test and sanity-test-playwright then compares the corresponding fixture and report.json.
smoke-test-checks.yml
Runs both smoke-test and smoke-test-playwright then compares the corresponding fixture and report.json.
docker-sanity-test-checks.yml and docker-smoke-test-checks.yml
Same, but via Docker.
Playwright Binaries Caching
Playwright takes a long time to install on every run, so I found a way to cache the binaries in GitHub Actions, using OS and version as a the "Caches" name (and lookup):
Overview
This PR implements new test check workflows, incorporates Composite Actions for efficient workflow coding, and sets up caching for Playwright binaries, extending the existing caching mechanism for Puppeteer.
Key Updates:
smoke
tests, and inclusion oftest/__fixtures__
forreport.json
snapshots.workflow_dispatch
, with potential forpull_request
activation upon code modification.Potentially closes #1533.
Details
Composite Actions
Located in
.github/actions/[action]/action.yml
, these actions simplify workflow setup, execution, and validation. Future plans include converting existing workflows to utilize these actions.Workflow Testing
The workflows focus on the
backstop test
command execution and the validation ofreport.json
against predefined fixtures. Tests catch overlooked sections and detect changes in report property names.General Workflow Steps
npm
test.report.json
with corresponding fixture (./test/__fixtures__/[npm-script-name].json
).report.json
properties for shape consistency.Smoke and Integration Test Specifics
report.json
before comparison.report.json
generated bybackstop test
, using a Bash script to select the latest report.Workflow Files
integration-test-check.yml
: Runs integration tests and assessesreport.json
.sanity-test-checks.yml
: Executes and comparessanity-test
andsanity-test-playwright
.smoke-test-checks.yml
: Handles bothsmoke-test
andsmoke-test-playwright
.sanity
andsmoke
tests.Playwright Binaries Caching
Improves efficiency by caching Playwright installations on GitHub Actions, using OS and version for cache identification.
Conclusion
These updates aim to streamline testing processes and improve reliability. Feedback on the inclusion or modification of these features is welcome.
Cheers!
Notes and Further Details
Composite Actions are repeatable pieces of code that can take inputs and produce outputs. They live in
.github/actions/[action]/action.yml
, and are a great way to keep setup, execution, and validation patterns DRY.I haven't changed any existing workflows to use composite actions yet, but am happy to do so.
New workflows are set to run only by
workflow_dispatch
for now. They can be enabled forpull_request
if desired, but need a code change. Let me know if interested and I'll add a commit :)What are they Testing?
Perhaps a portion of the
backstop test
command was accidentally commented out. For example:Running
npm run sanity-test
does not catch this in command output:However, by expecting a
test/configs/backstop_data/bitmaps_test/[TIMESTAMP]/report.json
we can catch a failed test run bydiff
(explained in detail later):Another example, maybe someone changes a report property name:
Both are forced examples, but provide a glimpse into what's possible.
Smoke Test Caveat
I've seen a few smoke tests pass on GitHub but fail locally. For now, test comparisons first filter the
report.json
objects, deleting properties we know will have different shapes (or not exist at all in a pass):Line breaks added for readability
diffImage
doesn't exist on passing tests, so it's removed before analyzingreport.json
. As you previously mentioned, smoke tests are somewhat unreliable."misMatchThreshold" : 0.1,
could also be bumped a bit, to be more forgiving.We can take a look at polishing smoke tests at some point, but this gets the job done! Below is a snapshot of the failing
diff
before filtering withjq
.Integration Caveat
The
integration-test
script generates two reports. One when runningbackstop reference
, and the other afterbackstop test
. We apply a fancybash
one-liner to find the most recently modified directory, and onlydiff
the final report.report.json
Filtration DetailsFirst and foremost, during a test check,
.tests[.pair]
object values are set to empty strings. Some values will never be 1:1, due to system runtime differences, browser changes over time, etc. Data shape only is being tested in these new workflows.jq
is used to traverse thereport.json
object and set nonarray
orobject
property values to empty strings:""
, which is also applied to nested properties within any aforementionedobject
orarray
.This affords a way to test the general "shape" of data we expect
backstop test
to produce, comparing it with corresponding JSON files intest/__fixtures/
.That ends up looking like this, which is the shape tested in
integration
andsanity
"check" workflows introduced in this PR:Happy to discuss in detail :)
Workflows
integration-test-check.yml
This runs
npm run integration-test
, then tests the resultantreport.json
, which is the last step in the project's integration test:backstop test
.The GitHub workflow results in a pass/fail based on shape alone. The unfiltered A/B fixture/CI
diff
is included in the workflow's summary for further analysis.sanity-test-checks.yml
Runs both
sanity-test
andsanity-test-playwright
then compares the corresponding fixture andreport.json
.smoke-test-checks.yml
Runs both
smoke-test
andsmoke-test-playwright
then compares the corresponding fixture andreport.json
.docker-sanity-test-checks.yml
anddocker-smoke-test-checks.yml
Same, but via Docker.
Playwright Binaries Caching
Playwright takes a long time to install on every run, so I found a way to cache the binaries in GitHub Actions, using OS and version as a the "Caches" name (and lookup):