dart-lang / dart_ci

Tools used by Dart's continuous integration (CI) testing that aren't needed by Dart SDK contributors. Mirrored from dart.googlesource.com/dart_ci. Do not land pull requests on Github.
BSD 3-Clause "New" or "Revised" License
18 stars 5 forks source link

Meta flakiness: automatic reapprovals #150

Open dcharkes opened 1 year ago

dcharkes commented 1 year ago

I spent 0.5-1 hour per gardening shift clicking through tests that are meta-flaking.

Most of them have:

  1. Already an issue
  2. A log that is textually similar to the previous time something failed

Maybe we could add something that (1) if the log is textually similar to the last known failure, and (2) the github issue mentioned in the approval of that failure is still open, we (a) automatically approve with the same GitHub issue and (b) leave a comment on the GitHub issue with the link to the log.

I don't know about the rest of the gardeners in the VM team, but it would save me at least 2 hours a month.

whesse commented 1 year ago

Yes, I would like to automate at least one way this happens.

If a flaky test resumes the failure mode it had before flaking, it should not be reported as a new failure. Does this cover the cases you mention above, or are they some other situation?

Other situations could be: Test fails stably (deflaking does not work) for just one commit. Test fails stably (deflaking does not detect as flaky), but switches between failing and passing on different commits. If it is marked flaky, it will stay flaky in this case. IE test alternates between periods of failure and passing.

The systems that decide whether to report a failure and approve it have no access to the logs, so it would be a big refactoring of the system to look at the logs and log history and Github. I'd like to see what we could do without those steps. I think we can tackle most of the problem knowing just the result status, the past history, and the flakiness history, which is what we do have.

dcharkes commented 1 year ago

If a flaky test resumes the failure mode it had before flaking, it should not be reported as a new failure. Does this cover the cases you mention above, or are they some other situation?

No, it would not, look at these histories.

meta flaking: pass -> runtime error -> pass -> runtime error

meta flaking: timeout -> pass -> flaky -> timeout

whesse commented 1 year ago

The first two cases would be covered if they were marked flaky - the switches between results are often enough that if this test is marked flaky once, it will remain marked flaky. This is where the solution of providing a way for gardeners to mark a test flaky will help a lot.

The second two cases seem to be occuring because we do not deflake a change from error->pass. When a cl fixes a lot of tests, we don't want to spend a lot of time deflaking that fix, in case it is spurious. We may be able to do something special in this case, by seeing if it has a recent flakiness history with a flaky pass, and deflaking the pass in that case.

Not deflaking a change from error->pass may also be contributing to the first two.