Open osa1 opened 1 year ago
@osa1 thank you! I added some missed asyncStart/End()
. It's not supposed that all async test should have asyncStart/End(). Tests like
main() async {
await test1();
...
await test2();
}
don't need asyncStart/End(). But I found a lot of tests that have excessive async
.
@sgrekhov I've noticed that these tests are flaky for dart2js and DDC when running on Firefox and Safari even though they all use the TouchEvent constructor which is not present in those browsers.
LibTest/html/Element/onTouchCancel_A01_t01
LibTest/html/Element/onTouchEnd_A01_t01
LibTest/html/Element/onTouchEnter_A01_t01
LibTest/html/Element/onTouchLeave_A01_t01
LibTest/html/Element/onTouchMove_A01_t01
LibTest/html/Element/onTouchStart_A01_t01
LibTest/html/Element/touchCancelEvent_A01_t01
LibTest/html/Element/touchEndEvent_A01_t01
LibTest/html/Element/touchEnterEvent_A01_t01
LibTest/html/Element/touchLeaveEvent_A01_t01
LibTest/html/Element/touchMoveEvent_A01_t01
LibTest/html/Element/touchStartEvent_A01_t01
LibTest/html/IFrameElement/onTouchCancel_A01_t01
LibTest/html/IFrameElement/onTouchEnd_A01_t01
LibTest/html/IFrameElement/onTouchEnter_A01_t01
LibTest/html/IFrameElement/onTouchLeave_A01_t01
LibTest/html/IFrameElement/onTouchMove_A01_t01
LibTest/html/IFrameElement/onTouchStart_A01_t01
They should consistently fail but the fact that they can sometimes pass leads me to believe that they might complete without running the code or recognizing the failure. It makes me question the reliability of the asyncStart/End guards. Can you think of other explanations?
Since the API simply doesn't exist I'm going to start skipping these tests on those browsers but I wanted to call out this observation.
@nshahan where do you see these flaky results? According to the https://dart-current-results.web.app/#/filter=co19/LibTest/html/Element/,co19/LibTest/html/IFrameElement/ all of these tests are not flaky but fail with TouchEvent is not defined
error.
May be it makes sense to change the expected result in this case? If we have no event to test then test should succeed. It's not an error if we have nothing to test. I'm going to update the tests this way.
I've noticed that these tests are flaky for dart2js and DDC when running on Firefox and Safari even though they all use the TouchEvent constructor which is not present in those browsers.
How did you determine they are flaky?
Did they actually flip between Pass
and Fail
/RuntimeError
?
They should consistently fail but the fact that they can sometimes pass leads me to believe that they might complete without running the code or recognizing the failure. It makes me question the reliability of the asyncStart/End guards. Can you think of other explanations?
The go/dart-current-results page seems to indicate they consistently pass.
When SQL querying the test results from dart-ci.results.results_60d
table for the co19/LibTest/html/Element/touch%
name it seems that they are flaking between Timeout
and RuntimeError
.
=> @nshahan Can it be that it's actually flakily timeout instead of failing?
/cc @athomas
The asyncStart
& asyncEnd
mechanism should be working correctly. It's a collaboration between the test itself and the test controller (running in browser) and tools/test.py
- once an asyncStart
has been issued the test is considered not done until an error or a asyncEnd
.
How did you determine they are flaky?
I keep seeing them turn our bots red from time to time when they change from flaky -> RuntimeError
.
Did they actually flip between Pass and Fail/RuntimeError?
Oh good intuition! They are never actually passing. Sorry about that assumption. Sometimes they get the Runtime error I would expect and sometimes they timeout.
You can see an example of oscillating between flaky and RuntimeError here: https://dart-ci.web.app/#showLatestFailures=false&configurations=ddc-linux-firefox&test=co19/LibTest/html/Element/touchEndEvent_A01_t01
@nshahan @sgrekhov The asyncStart
/ asyncEnd
machinery works perfectly fine here. The fact that the tests fail on some browsers should also be perfectly fine (so we shouldn't change the tests because of it). I think the root cause may be something else entirely: For example it can mean that the test runner starts too many browsers (it runs N tests in parallel) and that one test may consume a lot of CPU or RAM and cause other browser tabs to crash, ...
If there is nothing to tests in some brausers why the tests should fail? And the change (already commited but I can revert it if needed) should fix the flakeness. IMHO it's fine
If there is nothing to tests in some brausers why the tests should fail? And the change (already commited but I can revert it if needed) should fix the flakeness. IMHO it's fine
In general our tests should test the behavior we want/expect. So if an implementation doesn't adhere to this behavior/spec, the test should fail.
This way we can go to our test results database and check: Are e.g. touch events working on all browsers? And we'll see all the browsers where they work and those where they don't work (Instead of seeing all touch event tests passing, giving us the illusion of this working everywhere - even though on some browsers the API throws)
There's a gray line when APIs are e.g. optional (e.g. not in official W3C spec, ...). Since it's web-specific, I'll leave this to web team @nshahan @sigmundch.
We have several tests where a specific behavior is expected based on the number semantics. They would make the distinction that some configurations have web numbers and others have native numbers, and the expectations differ. So perhaps we need to make more than one distinction:
In the former case we just need a run-time test that determines which kind of configuration we're currently running, or we may have several tests that are only executed with specific configurations (using status files, no less ;-).
If we have a situation like the latter then we could have a test that expects the known outcome (e.g., it could expect that there's a timeout, or the browser crashes with a bus error, or whatever), but the test runner doesn't (perhaps can't) support that. So even though item 2 looks like a special case of item 1, we may not be able to test it in the usual way.
Next, even if we can arrange for every test succeeding on every configuration (that is, it works, or we expect that it crashes and it actually crashes, etc.), the information about known deficiencies of specific platforms/tools is still implicit.
It seems likely that we'd want some kind of provenance for this. E.g., it could be useful to be able to query things like "which features are known to be broken with this and that specific browser", in particular if the status changes and a bunch of tests need to be updated. A bit like the approval associations where a test failure should allow us to look up the issue where that failure is being addressed.
In any case, I'm not convinced that we should consider a test failure to be a normal behavior that nobody should ever bother to look into.
While debugging https://github.com/dart-lang/sdk/issues/54140 @mkustermann realized that some of the async tests in co19 are missing
asyncStart
andasyncEnd
:When testing in the browser, without
asyncStart
andasyncEnd
, the test driver is not waiting for the test to finish and declaring a success even when the test actually fails.I quickly searched for all the files with an async
main
and anasyncStart
, and when I compared the outputs I see that there are dozens of files wheremain
is async but the file doesn't haveasyncStart
. For example:I used
ag -Q 'main() async' -c
to find tests withasync main
andag -Q 'asyncStart' -c
to find the tests withasyncStart
. (I think it might make sense to search forawait
instead ofmain() async
).