Open yjbanov opened 4 years ago
which we have to do, otherwise tests fail to run at all when compiled using DDC
This is definitely not the case in general - I don't know if its worth investigating or not why you are experiencing that issue? It seems orthogonal in any case but it should speed up your test runs a fair bit to have better concurrency.
@yjbanov are you able to provide a consistent repro for this? I tried doing a basic repro and couldn't get it to happen.
If you patch in this PR and remove the skip
in paginated_data_table_test.dart
, that should trigger it on Cirrus very quickly. To reproduce this locally, I also had to artificially lower the timeout value in heartbeat
.
This should do it:
cd packages/flutter
flutter -v test --platform=chrome \
test/widgets/semantics_test.dart \
test/rendering/editable_gesture_test.dart \
test/material/page_transitions_theme_test.dart \
test/widgets/framework_test.dart \
test/physics/utils_test.dart \
test/material/snack_bar_test.dart \
test/material/paginated_data_table_test.dart \
test/services/message_codecs_vm_test.dart \
test/rendering/flex_test.dart \
test/rendering/dynamic_intrinsics_test.dart \
test/widgets/inherited_dependencies_test.dart \
test/material/flexible_space_bar_test.dart \
test/material/date_picker_test.dart \
test/widgets/keep_alive_test.dart \
test/harness/trivial_test.dart \
test/widgets/state_setting_in_scrollables_test.dart \
test/widgets/title_test.dart \
test/widgets/set_state_1_test.dart \
test/widgets/reassemble_test.dart \
test/material/tooltip_test.dart
I should mention that back when we had concurrency of more than 1 we were also getting the same timeouts. They were just happening way more often.
Is this still an issue? I would be curious to see a consistent reproduction case.
Yes. Recently, this issue caused a P0 to be misdiagnosed: https://github.com/flutter/flutter/issues/90170. Can the string "compiling" be changed to "waiting to start"? I think that string comes from the days when most tests were compiled on-the-fly. Flutter always uses pre-compiled tests, so at no point do we compile anything.
This continues to be a problem. Below are recent examples of failing tests that are hard to diagnose because the error is not clear:
@yjbanov - an infrastructure level timeout is unrelated to an invoker/heartbeat related timeout. The 3 links look like a different issue - can you file a new one? Have you ever been able to reproduce the problem locally? We've also had browser reliability issues in other places, and it's always been tricky to pin down the cause. I'm currently looking at one potential cause of flakiness in the channel between the browser host and the iframe running a test: https://github.com/dart-lang/test/pull/2061
After some more discussion I think our remaining issue is that something in the browser platform stack is flaky. #2061 is intended to make a part of that chain a bit simpler. To get more information about whether that implementation is causing the problem though we'll need to implement a way to get console logs communicated back to the user - this could happen first in the Flutter implementation of the web platform, or in this package, it's likely to have some copy/paste in either direction the first time it get's implemented. Using more shared code for browser platforms would be nice - but we haven't investigated what that would take.
I also opened #2063 to make the "compiling" part less confusing by just using "loading" unconditionally.
Is there an update on #2061? I'm currently treating this issue as a blocker for https://github.com/flutter/flutter/issues/129696 and https://github.com/flutter/flutter/issues/129695.
https://github.com/dart-lang/test/pull/2061#issuecomment-1641134897
Edit: I filed https://github.com/dart-lang/test/issues/2065 with a longer description of how we can roll this out. I don't expect a super quick turnaround.
Thanks for the update!
Has there been any progress on improving the timeout reporting? Is there anything we can try on the Flutter side?
Progress on the specific changes we have planned so far is tracked in #2065 - flutter is not yet using the new communication protocol. It looks like the current package roll is 5 days stale https://github.com/flutter/flutter/pull/131442 - once flutter does a pub package roll which includes https://pub.dev/packages/test/versions/1.24.5 then I will open a PR to change flutter's copy of postMessageChannel
. If that PR does not help with flakiness I will spend some time working on improving logging so we have a better idea where we are getting stuck.
improving the timeout reporting?
There are no other changes planned around the reporting of this timeout. The next pub package roll will change the confusing "compiling" to "loading", and otherwise the reporting is as complete and accurate as possible given the lack of visibility into the part of the process which is getting stuck.
Thanks! I'll do a package roll when I get the chance (probably next week, unless someone else beats me to it).
This has rolled into flutter and I updated the flutter platform to use the new pattern. It doesn't appear to have resolved flakes on forge, so I'm still digging to see if I can figure out where it is getting stuck.
Any useful data from forge?
Any useful data from forge?
Synced in a separate thread. For anyone else following along - I'm stuck on this. I was able to reproduce the behavior internally on chrome version 115.0.5790.110, but I cannot reproduce it on chrome version 116.0.5845.96
I was able to see chrome logs, including console output, and printing inside a Timer.periodic
, along with some other printf debugging to see how much progress is made, seems to show that sometimes the JS would stop running with no indication of a reason. It could happen within synchronous blocks of code, it's unlikely we had a bug with an infinite loop.
I am very interested in seeing any new cases on the new chrome version.
We rolled to Chrome 117 and there are still reports of timeouts.
Is it possible to optionally remove the 12-minute timeout? We already have a global timeout of 30 minutes, so this extra timeout isn't buying us much. It will also help us eliminate the possibility that the 12-minute timeout is too aggressive.
We rolled to Chrome 117 and there are still reports of timeouts.
Do you have any logs from these runs?
Is it possible to optionally remove the 12-minute timeout?
Which timeout is this? I don't recall any 12-minute timeouts in the test runner
In this test log, notice that invoker.dart reports timeouts for
flex_test.dart
anddynamic_intrinsics_test.dart
. However, neither of these two tests time out. If you run them in isolation, they will finish quickly and successfully.The test that's actually timing out is
paginated_data_table_test.dart
. You can see it logged many times in the log. Notably, the log does not contain a message about this test timing out at all.My theory is that the implementation of
heartbeat
is flawed, or the test loading system is, in that when you reduce concurrency to 1 (which we have to do, otherwise tests fail to run at all when compiled using DDC) it still attempts to load multiple tests but they are bottlenecked by the lack of concurrency. The heartbeat watcher starts, but the tests are waiting for other tests to finish. In the example test run above,flex_test.dart
starts the heartbeat, but it's stuck waiting forpaginated_data_table_test.dart
to finish, which counts againstflex_test.dart
's timeout quota.