Overview

This PR implements new test check workflows, incorporates Composite Actions for efficient workflow coding, and sets up caching for Playwright binaries, extending the existing caching mechanism for Puppeteer.

Key Updates:

New test check workflows for schema validation and expected output, enhancing PR change detection.
Integration of Composite Actions for reusable workflow segments.
Playwright bitmap references added for smoke tests, and inclusion of test/__fixtures__ for report.json snapshots.
Workflow runs currently triggered by workflow_dispatch, with potential for pull_request activation upon code modification.

Potentially closes #1533.

Details

Composite Actions

Located in .github/actions/[action]/action.yml, these actions simplify workflow setup, execution, and validation. Future plans include converting existing workflows to utilize these actions.

Workflow Testing

The workflows focus on the backstop test command execution and the validation of report.json against predefined fixtures. Tests catch overlooked sections and detect changes in report property names.

General Workflow Steps

Execute an npm test.
Compare report.json with corresponding fixture (./test/__fixtures__/[npm-script-name].json).
Pre-filter report.json properties for shape consistency.
Summarize results with a Pass/Fail determination.

Smoke and Integration Test Specifics

Smoke Tests: Address discrepancies between local and GitHub runs by filtering report.json before comparison.
Integration Tests: Focus on the final report.json generated by backstop test, using a Bash script to select the latest report.

Workflow Files

integration-test-check.yml: Runs integration tests and assesses report.json.
sanity-test-checks.yml: Executes and compares sanity-test and sanity-test-playwright.
smoke-test-checks.yml: Handles both smoke-test and smoke-test-playwright.
Docker-related workflows follow a similar structure for sanity and smoke tests.

Playwright Binaries Caching

Improves efficiency by caching Playwright installations on GitHub Actions, using OS and version for cache identification.

Conclusion

These updates aim to streamline testing processes and improve reliability. Feedback on the inclusion or modification of these features is welcome.

Cheers!

Notes and Further Details

Composite Actions are repeatable pieces of code that can take inputs and produce outputs. They live in .github/actions/[action]/action.yml, and are a great way to keep setup, execution, and validation patterns DRY.

I haven't changed any existing workflows to use composite actions yet, but am happy to do so.

New workflows are set to run only by workflow_dispatch for now. They can be enabled for pull_request if desired, but need a code change. Let me know if interested and I'll add a commit :)

What are they Testing?

Perhaps a portion of the backstop test command was accidentally commented out. For example:

module.exports = {
  execute: function (config) {
    const executeCommand = require('./index');
    // if (shouldRunDocker(config)) {
    //   return runDocker(config, 'test')
    //     .finally(() => {
    //       if (config.openReport && config.report && config.report.indexOf('browser') > -1) {
    //         executeCommand('_openReport', config);
    //       }
    //     });
    // } else {
    //   return createBitmaps(config, false).then(function () {
    //     // return executeCommand('_report', config);
    //   });
    // }
  }
};

Running npm run sanity-test does not catch this in command output:

COMMAND | Executing core for "test"
COMMAND | Resolved already:test
COMMAND | Command "test" successfully executed in [0.001s]

However, by expecting a test/configs/backstop_data/bitmaps_test/[TIMESTAMP]/report.json we can catch a failed test run by diff (explained in detail later):

Another example, maybe someone changes a report property name:

Both are forced examples, but provide a glimpse into what's possible.

Smoke Test Caveat

I've seen a few smoke tests pass on GitHub but fail locally. For now, test comparisons first filter the report.json objects, deleting properties we know will have different shapes (or not exist at all in a pass):

jq 'walk(if type == "object" then with_entries(.value |= if type == "object" or type == "array" then . else "" end) else . end) | \
\
del(.tests[].pair.diff, .tests[].pair.diffImage)' \
\
test/__fixtures__/smoke-test.json

Line breaks added for readability

diffImage doesn't exist on passing tests, so it's removed before analyzing report.json . As you previously mentioned, smoke tests are somewhat unreliable. "misMatchThreshold" : 0.1, could also be bumped a bit, to be more forgiving.

We can take a look at polishing smoke tests at some point, but this gets the job done! Below is a snapshot of the failing diff before filtering with jq.

Integration Caveat

The integration-test script generates two reports. One when running backstop reference, and the other after backstop test. We apply a fancy bash one-liner to find the most recently modified directory, and only diff the final report.

`report.json` Filtration Details

First and foremost, during a test check, .tests[.pair] object values are set to empty strings. Some values will never be 1:1, due to system runtime differences, browser changes over time, etc. Data shape only is being tested in these new workflows.

jq is used to traverse the report.json object and set non array or object property values to empty strings: "", which is also applied to nested properties within any aforementioned object or array.

This affords a way to test the general "shape" of data we expect backstop test to produce, comparing it with corresponding JSON files in test/__fixtures/.

That ends up looking like this, which is the shape tested in integration and sanity "check" workflows introduced in this PR:

{
  "testSuite": "",
  "tests": [
    {
      "pair": {
        "reference": "",
        "test": "",
        "selector": "",
        "fileName": "",
        "label": "",
        "requireSameDimensions": "",
        "misMatchThreshold": "",
        "url": "",
        "referenceUrl": "",
        "expect": "",
        "viewportLabel": "",
        "diff": {
          "isSameDimensions": "",
          "dimensionDifference": {
            "width": "",
            "height": ""
          },
          "rawMisMatchPercentage": "",
          "misMatchPercentage": "",
          "analysisTime": ""
        },
        "diffImage": ""
      },
      "status": ""
    },
    {
      "pair": {
        "reference": "",
        "test": "",
        "selector": "",
        "fileName": "",
        "label": "",
        "requireSameDimensions": "",
        "misMatchThreshold": "",
        "url": "",
        "referenceUrl": "",
        "expect": "",
        "viewportLabel": "",
        "diff": {
          "isSameDimensions": "",
          "dimensionDifference": {
            "width": "",
            "height": ""
          },
          "rawMisMatchPercentage": "",
          "misMatchPercentage": "",
          "analysisTime": ""
        },
        "diffImage": ""
      },
      "status": ""
    }
  ],
  "id": ""
}

Happy to discuss in detail :)

Workflows

`integration-test-check.yml`

This runs npm run integration-test, then tests the resultant report.json , which is the last step in the project's integration test: backstop test.

The GitHub workflow results in a pass/fail based on shape alone. The unfiltered A/B fixture/CI diff is included in the workflow's summary for further analysis.

![NOTE] All workflow summaries contain the unfiltered diff, under the "Unfiltered Diff" heading in workflow summary. There will always be timestamp directory-name differences in the "test" property, which further illustrates why property/value filtering is needed.

`sanity-test-checks.yml`

Runs both sanity-test and sanity-test-playwright then compares the corresponding fixture and report.json.

`smoke-test-checks.yml`

Runs both smoke-test and smoke-test-playwright then compares the corresponding fixture and report.json.

`docker-sanity-test-checks.yml` and `docker-smoke-test-checks.yml`

Same, but via Docker.

Playwright Binaries Caching

Playwright takes a long time to install on every run, so I found a way to cache the binaries in GitHub Actions, using OS and version as a the "Caches" name (and lookup):

Located here in the GitHub UI: https://github.com/dgrebb/BackstopJS/actions/caches

garris / BackstopJS

feat(ci): CI/CD workflows and composite actions to check test results (#1533) #1537

Overview

Details

Composite Actions

Workflow Testing

General Workflow Steps

Smoke and Integration Test Specifics

Workflow Files

Playwright Binaries Caching

Conclusion

Notes and Further Details

What are they Testing?

Smoke Test Caveat

Integration Caveat

`report.json` Filtration Details

Workflows

`integration-test-check.yml`

`sanity-test-checks.yml`

`smoke-test-checks.yml`

`docker-sanity-test-checks.yml` and `docker-smoke-test-checks.yml`

Playwright Binaries Caching

garris / BackstopJS

feat(ci): CI/CD workflows and composite actions to check test results (#1533) #1537

Overview

Details

Composite Actions

Workflow Testing

General Workflow Steps

Smoke and Integration Test Specifics

Workflow Files

Playwright Binaries Caching

Conclusion

Notes and Further Details

What are they Testing?

Smoke Test Caveat

Integration Caveat

report.json Filtration Details

Workflows

integration-test-check.yml

sanity-test-checks.yml

smoke-test-checks.yml

docker-sanity-test-checks.yml and docker-smoke-test-checks.yml

Playwright Binaries Caching

`report.json` Filtration Details

`integration-test-check.yml`

`sanity-test-checks.yml`

`smoke-test-checks.yml`

`docker-sanity-test-checks.yml` and `docker-smoke-test-checks.yml`