garris / BackstopJS

Catch CSS curve balls.
http://backstopjs.org
MIT License
6.66k stars 602 forks source link

feat(ci): CI/CD workflows and composite actions to check test results (#1533) #1537

Closed dgrebb closed 5 months ago

dgrebb commented 5 months ago

Overview

This PR implements new test check workflows, incorporates Composite Actions for efficient workflow coding, and sets up caching for Playwright binaries, extending the existing caching mechanism for Puppeteer.

Key Updates:

Test ChecksDocker Test Checks

Potentially closes #1533.

Details

Composite Actions

Located in .github/actions/[action]/action.yml, these actions simplify workflow setup, execution, and validation. Future plans include converting existing workflows to utilize these actions.

Workflow Testing

The workflows focus on the backstop test command execution and the validation of report.json against predefined fixtures. Tests catch overlooked sections and detect changes in report property names.

General Workflow Steps

  1. Execute an npm test.
  2. Compare report.json with corresponding fixture (./test/__fixtures__/[npm-script-name].json).
  3. Pre-filter report.json properties for shape consistency.
  4. Summarize results with a Pass/Fail determination.

Smoke and Integration Test Specifics

Workflow Files

Playwright Binaries Caching

Improves efficiency by caching Playwright installations on GitHub Actions, using OS and version for cache identification.

Conclusion

These updates aim to streamline testing processes and improve reliability. Feedback on the inclusion or modification of these features is welcome.

Cheers!


Notes and Further Details

Composite Actions are repeatable pieces of code that can take inputs and produce outputs. They live in .github/actions/[action]/action.yml, and are a great way to keep setup, execution, and validation patterns DRY.

I haven't changed any existing workflows to use composite actions yet, but am happy to do so.

New workflows are set to run only by workflow_dispatch for now. They can be enabled for pull_request if desired, but need a code change. Let me know if interested and I'll add a commit :)

What are they Testing?

Perhaps a portion of the backstop test command was accidentally commented out. For example:

module.exports = {
  execute: function (config) {
    const executeCommand = require('./index');
    // if (shouldRunDocker(config)) {
    //   return runDocker(config, 'test')
    //     .finally(() => {
    //       if (config.openReport && config.report && config.report.indexOf('browser') > -1) {
    //         executeCommand('_openReport', config);
    //       }
    //     });
    // } else {
    //   return createBitmaps(config, false).then(function () {
    //     // return executeCommand('_report', config);
    //   });
    // }
  }
};

Running npm run sanity-test does not catch this in command output:

COMMAND | Executing core for "test"
COMMAND | Resolved already:test
COMMAND | Command "test" successfully executed in [0.001s]

However, by expecting a test/configs/backstop_data/bitmaps_test/[TIMESTAMP]/report.json we can catch a failed test run by diff (explained in detail later):

image

Another example, maybe someone changes a report property name:

image

Both are forced examples, but provide a glimpse into what's possible.

Smoke Test Caveat

I've seen a few smoke tests pass on GitHub but fail locally. For now, test comparisons first filter the report.json objects, deleting properties we know will have different shapes (or not exist at all in a pass):

jq 'walk(if type == "object" then with_entries(.value |= if type == "object" or type == "array" then . else "" end) else . end) | \
\
del(.tests[].pair.diff, .tests[].pair.diffImage)' \
\
test/__fixtures__/smoke-test.json

Line breaks added for readability

diffImage doesn't exist on passing tests, so it's removed before analyzing report.json . As you previously mentioned, smoke tests are somewhat unreliable. "misMatchThreshold" : 0.1, could also be bumped a bit, to be more forgiving.

We can take a look at polishing smoke tests at some point, but this gets the job done! Below is a snapshot of the failing diff before filtering with jq.

image

Integration Caveat

The integration-test script generates two reports. One when running backstop reference, and the other after backstop test. We apply a fancy bash one-liner to find the most recently modified directory, and only diff the final report.

report.json Filtration Details

First and foremost, during a test check, .tests[.pair] object values are set to empty strings. Some values will never be 1:1, due to system runtime differences, browser changes over time, etc. Data shape only is being tested in these new workflows.

jq is used to traverse the report.json object and set non array or object property values to empty strings: "", which is also applied to nested properties within any aforementioned object or array.

This affords a way to test the general "shape" of data we expect backstop test to produce, comparing it with corresponding JSON files in test/__fixtures/.

That ends up looking like this, which is the shape tested in integration and sanity "check" workflows introduced in this PR:

{
  "testSuite": "",
  "tests": [
    {
      "pair": {
        "reference": "",
        "test": "",
        "selector": "",
        "fileName": "",
        "label": "",
        "requireSameDimensions": "",
        "misMatchThreshold": "",
        "url": "",
        "referenceUrl": "",
        "expect": "",
        "viewportLabel": "",
        "diff": {
          "isSameDimensions": "",
          "dimensionDifference": {
            "width": "",
            "height": ""
          },
          "rawMisMatchPercentage": "",
          "misMatchPercentage": "",
          "analysisTime": ""
        },
        "diffImage": ""
      },
      "status": ""
    },
    {
      "pair": {
        "reference": "",
        "test": "",
        "selector": "",
        "fileName": "",
        "label": "",
        "requireSameDimensions": "",
        "misMatchThreshold": "",
        "url": "",
        "referenceUrl": "",
        "expect": "",
        "viewportLabel": "",
        "diff": {
          "isSameDimensions": "",
          "dimensionDifference": {
            "width": "",
            "height": ""
          },
          "rawMisMatchPercentage": "",
          "misMatchPercentage": "",
          "analysisTime": ""
        },
        "diffImage": ""
      },
      "status": ""
    }
  ],
  "id": ""
}

Happy to discuss in detail :)

Workflows

integration-test-check.yml

This runs npm run integration-test, then tests the resultant report.json , which is the last step in the project's integration test: backstop test.

The GitHub workflow results in a pass/fail based on shape alone. The unfiltered A/B fixture/CI diff is included in the workflow's summary for further analysis.

![NOTE] All workflow summaries contain the unfiltered diff, under the "Unfiltered Diff" heading in workflow summary. There will always be timestamp directory-name differences in the "test" property, which further illustrates why property/value filtering is needed.

image

sanity-test-checks.yml

Runs both sanity-test and sanity-test-playwright then compares the corresponding fixture and report.json.

smoke-test-checks.yml

Runs both smoke-test and smoke-test-playwright then compares the corresponding fixture and report.json.

docker-sanity-test-checks.yml and docker-smoke-test-checks.yml

Same, but via Docker.

Playwright Binaries Caching

Playwright takes a long time to install on every run, so I found a way to cache the binaries in GitHub Actions, using OS and version as a the "Caches" name (and lookup):

image Located here in the GitHub UI: https://github.com/dgrebb/BackstopJS/actions/caches