dart-lang / dart_ci

Tools used by Dart's continuous integration (CI) testing that aren't needed by Dart SDK contributors. Mirrored from dart.googlesource.com/dart_ci. Do not land pull requests on Github.
BSD 3-Clause "New" or "Revised" License
18 stars 5 forks source link

Handle flaky test returning to previously approved stable failure #106

Open athomas opened 3 years ago

athomas commented 3 years ago

Today, a consistently failing test that rarely times out causes redness when it leaves the flaky state.

Example: 1) test A normally produces a RuntimeError. 2) Because of test framework issues, sometimes it will time out. 3) Sometimes the deflaking will produce consistent time outs and the test causes redness directly. 4) Sometimes the deflaking will detect the test as flaky and it will be marked flaky. 5) After not producing a time out for 100 runs it will be marked as RuntimeError again, which causes redness.

The proposal is to handle the redness caused by 3) and 5) somehow. 3) is likely temporary and persists for only a single build. 5) is permanent because this is the steady state. So perhaps, if a test goes from a flaky or timeout to a previously approved state, that should re-apply the previous approval.

Real world examples: https://dart-ci.firebaseapp.com/#showLatestFailures=false&test=co19_2/Language/Expressions/Constants/integer_size_t04&configurations=dart2js-hostasserts-linux-ia32-d8 https://dart-ci.firebaseapp.com/#showLatestFailures=false&test=corelib_2/integer_parsed_mul_div_vm_test&configurations=dart2js-hostasserts-linux-ia32-d8

Consistent timeouts in compilation: https://dart-ci.appspot.com/log/dart2js-strong-hostasserts-linux-ia32-d8/dart2js-hostasserts-linux-ia32-d8/12360/corelib_2/integer_parsed_mul_div_vm_test

Consistent RTE (trying to use int64 semantics on the web): https://dart-ci.appspot.com/log/dart2js-strong-hostasserts-linux-ia32-d8/dart2js-hostasserts-linux-ia32-d8/12361/corelib_2/integer_parsed_mul_div_vm_test