Automattic / a8c-ci-toolkit-buildkite-plugin

A caching plugin that can be invoked from your build script.
21 stars 5 forks source link

Fix `annotate_test_failures` for flaky retries which ultimately succeed #46

Closed AliSoftware closed 1 year ago

AliSoftware commented 1 year ago

What?

Fixes annotate_test_failures command, to ignore cases of flaky retries which ultimately succeeded.

This bug was the cause for some CI builds to still get an annotation mentioning test failures… even when the corresponding CI step ended up green (See this example build):

test-failure-annotation-but-step-green

(cc @mokagio & @crazytonyli as I talked about this bug very recently with both of you, in P2s and PR comments respectively)

Why?

When a flaky test failed but Xcode is configured to auto-retry tests multiple times on failure, all the intermediate failures are recorded in the report.junit alongside the final success (if any). For example this is an extract of such .junit report:

    <testcase classname='WordPressTest.TenorDataSourceTests' name='testDataSourceReceivesRequestedCount'>
      <failure message='Asynchronous wait failed: Exceeded timeout of 2 seconds, with unfulfilled expectations: &quot;Waiting&quot;.'>&lt;unknown&gt;:0</failure>
    </testcase>
    <testcase classname='WordPressTest.TenorDataSourceTests' name='testDataSourceReceivesRequestedCount'>
      <failure message='XCTAssertEqual failed: (&quot;Optional(0)&quot;) is not equal to (&quot;Optional(3)&quot;)'>WordPress/WordPressTest/MediaPicker/Tenor/TenorDataSouceTests.swift:37</failure>
    </testcase>
    <testcase classname='WordPressTest.TenorDataSourceTests' name='testDataSourceReceivesRequestedCount' time='1.042'/>

The previous version of our script were not smart enough to detect that, and simply extracted all testcase nodes that had a failure subnode, and built the annotation with the list of found failures from that. Which is why it erroneously included the flaky failures

How?

To solve this, during the iteration on the testcase[failure] node candidates, we are now finding all the sibling nodes to that testcase which happens to have the same classname and name attributes, i.e. nodes reporting all the assertion failures for the same test. This list will thus include the current candidate being iterated on, but also potentially all other retries of the same test.

Once we got that list of nodes, we check if the last one of those nodes ends up being a failure or success XML node:

Testing

WPiOS Demo build with ❌ failed + ⚠️ flaky tests

image

WPiOS Demo build with ⚠️ only flaky tests

image

WPiOS Demo build with ❌ only failed tests

image
AliSoftware commented 1 year ago

@mokagio I spent some time (most of my afternoon, tbh 😅 ) testing things around and improved the changes significantly:

The improvements I made include:

@mokagio Given all the changes I've made after all that intense testing and tweaking, I'm interested in your re-review! (Note: I've updated the testing instructions in the PR description)

mokagio commented 1 year ago

While commenting about reporting the return code from the test task rather than the annotation command, I realized that there is nothing platform-specific that forces us to run the annotation logic in the macOS agent.

We could move it to a cheaper agent with Ruby support, but:

  1. The build time overhead of the annotation is negligible compared to the test run time
  2. Having tests and annotation together is handy if we retry the tests. I'm not sure how it would work if we split them... Does Buildkite handle a retry step re-triggering steps that are downstream from it? 🤔 No sure... and given point 1, don't think it's worth finding out