dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.23k stars 1.57k forks source link

Batch of unrelated tests started flaking starting on March 7 #48530

Open sstrickl opened 2 years ago

sstrickl commented 2 years ago

There are new test failures on [pkg:js] Disallow all operators in JS classes.

The tests

co19_2/LibTest/io/RandomAccessFile/writeByte_A01_t01 RuntimeError (expected Pass)
pkg/dds/test/dap/integration/debug_attach_test RuntimeError (expected Pass)
vm/dart_2/isolates/reload_many_isolates_live_and_die_test Timeout (expected Pass)

are failing on configurations

dartk-linux-release-simarm
dartkp-linux-release-x64
unittest-asserts-release-mac-arm64

The main thing all three tests here have in common is that they've been flipping back and forth between passing, being flaky, and turning red (either with a RuntimeError or, in the case of pkg/dds/..., timing out) since March 7. Approving as a batch for now, but the fact that they all have approximately the same range of flakiness is concerning.

sstrickl commented 2 years ago

Note that the fuzz tester has also been consistently red since then, though the first two failures in the sequence are due to RISCV build failures and thus it's unclear if either would have been red without it. Since the build failure was fixed, almost all fuzz testing shards are reporting at least one divergence, instead of it being a single shard now and then, which is concerning.

sstrickl commented 2 years ago

After a quick skim, it looks like all the divergences seen on the fuzzer are also RISCV related, so probably unrelated to the flakiness above on non-RISCV bots.

whesse commented 2 years ago

This is a bug in the builder recipes on the CI - all previously detected flakiness was dropped, so all flakiness is being reported as if it was new.

The fix has landed, and I will restore all the previous flakiness records next. Thanks for noticing! This is what our CI console would look like if we only had deflaking on each run, with no tracking of known flaky tests. It is surprisingly not that bad - many builders remained totally green.