Modified ` tests/language_2/variance/variance_*_test.dart` tests caused new failures

mkustermann commented 5 years ago

After ef742aa6f0b71827b2293c5eeff0cdb57f753965 landed there were new failures on VM configurations. For example []():

/===========================================================================================\
| language_2/variance/variance_in_subtyping_test failed again (RuntimeError, expected Pass) |
\===========================================================================================/

--- Command "vm" (took 02.000721s):
DART_CONFIGURATION=DebugX64 out/DebugX64/dart --enable-experiment=variance --ignore-unrecognized-flags --packages=/b/s/w/ir/.packages /b/s/w/ir/tests/language_2/variance/variance_in_subtyping_test.dart

exit code:
255

stderr:
Unhandled exception:
Expect.type(Instance of 'Contravariant<Middle>' is Contravariant<Lower>) fails on Instance of 'Contravariant<Middle>'
#0      Expect._fail (package:expect/expect.dart:665:5)
#1      Expect.type (package:expect/expect.dart:624:5)
#2      main (file:///b/s/w/ir/tests/language_2/variance/variance_in_subtyping_test.dart:64:10)
#3      _startIsolate.<anonymous closure> (dart:isolate-patch/isolate_patch.dart:305:19)
#4      _RawReceivePortImpl._handleMessage (dart:isolate-patch/isolate_patch.dart:172:12)

--- Re-run this test:
python tools/test.py -n dartk-linux-debug-x64 language_2/variance/variance_in_subtyping_test

/==============================================================================================\
| language_2/variance/variance_inout_subtyping_test failed again (RuntimeError, expected Pass) |
\==============================================================================================/

--- Command "vm" (took 02.000593s):
DART_CONFIGURATION=DebugX64 out/DebugX64/dart --enable-experiment=variance --ignore-unrecognized-flags --packages=/b/s/w/ir/.packages /b/s/w/ir/tests/language_2/variance/variance_inout_subtyping_test.dart

exit code:
255

stderr:
Unhandled exception:
Expect.type(Instance of 'Invariant<Middle>' is! Invariant<Upper>) fails on Instance of 'Invariant<Middle>'
#0      Expect._fail (package:expect/expect.dart:665:5)
#1      Expect.notType (package:expect/expect.dart:632:5)
#2      main (file:///b/s/w/ir/tests/language_2/variance/variance_inout_subtyping_test.dart:104:10)
#3      _startIsolate.<anonymous closure> (dart:isolate-patch/isolate_patch.dart:305:19)
#4      _RawReceivePortImpl._handleMessage (dart:isolate-patch/isolate_patch.dart:172:12)

--- Re-run this test:
python tools/test.py -n dartk-linux-debug-x64 language_2/variance/variance_inout_subtyping_test

I suppose these are new language features.

Tentatively assign to @crelier . Feel free to close this one if there's a more general bug for it.

eernstg commented 5 years ago

The features tested in tests/language_2/variance are being implemented as part of an internship project (@leafpetersen may have further comments on this). It is rather likely to become a regular Dart feature later on, but right now it is not. I believe the right thing to do here would be to say Skip in some status files. @kallentu, this entry in 'tests/language_2/language_2_dartdevc.status' should be a usable pattern to follow:

nnbd/*: Skip

leafpetersen commented 5 years ago

We're avoiding skipping these, since they should eventually pass. The failures should have been pre-approved - I'll re-assign this to @kallentu to take a look at.

leafpetersen commented 5 years ago

@mkustermann what failures were you seeing? Looking at the waterfall, I see all bots either pre-approved (green on landing) or post-approved (green on next CL).

mkustermann commented 5 years ago

The following three failures are failing due to such tests. I approved them:

leafpetersen commented 5 years ago

Something is odd here. This is the build in which the new tests from @kallentu landed:

https://ci.chromium.org/p/dart/builders/ci.sandbox/vm-dartkb-linux-release-x64/5537

That build is green: there are no new unapproved failing tests (there are some new approved failures, as expected).

This is the next build:

https://ci.chromium.org/p/dart/builders/ci.sandbox/vm-dartkb-linux-release-x64/5538

That build has new failures of the tests that were modified by @kallentu .

@athomas what's happening here? As best I can tell, the tests were landed in an approved state, the bots went green, and then in a subsequent build they went red.

athomas commented 5 years ago

This looks like some sort of race condition around the approvals.

The tests began failing here (5537), but were recognized as approved: https://logs.chromium.org/logs/dart/buildbucket/cr-buildbucket.appspot.com/8898766744132285184/+/steps/test_results/0/logs/tests_that_began_failing/0 On the subsequent build (5538), they failed in the same way, but for some reason were no longer approved: https://logs.chromium.org/logs/dart/buildbucket/cr-buildbucket.appspot.com/8898762137430915232/+/steps/test_results/0/logs/unapproved_failing_tests/0

@sortie any theories on how that could happen?

sortie commented 5 years ago

Hi, the approvals used for each build are stored in cloud storage. We can diagnose what happened using these commands:

gsutil.py cp gs://dart-test-results/builders/vm-dartkb-linux-release-x64/5537/approved_results.json 5537.json
gsutil.py cp gs://dart-test-results/builders/vm-dartkb-linux-release-x64/5538/approved_results.json 5538.json
diff -u <(jq . < 5537.json) <(jq . < 5538.json)

Whose output contains fragments like this:

@@ -1054303,11 +1054327,11 @@
   "configuration": "dartkb-mixed-linux-release-x64",
   "suite": "language_2",
   "test_name": "variance/variance_inout_subtyping_test",
-  "result": "RuntimeError",
+  "result": "Pass",
   "expected": "Pass",
-  "matches": false,
-  "approver": "kallentu",
-  "approved_at": "2019-10-23T20:51:57.154995Z",
+  "matches": true,
+  "approver": "ajcbik",
+  "approved_at": "2019-10-24T00:12:13.719685Z",
   "preapprovals": {}
 }
 {
@@ -1069563,6 +1069587,18 @@
   "preapprovals": {}
 }
 {
+  "name": "samples/ffi/samples_test",
+  "configuration": "dartkb-mixed-linux-release-x64",
+  "suite": "samples",
+  "test_name": "ffi/samples_test",
+  "result": "Pass",
+  "expected": "Pass",
+  "matches": true,
+  "approver": "automatic-approval",
+  "approved_at": "2019-10-24T00:14:54.002287Z",
+  "preapprovals": {}
+}
+{
   "name": "samples/sample_extension/test/sample_extension_app_snapshot_test",
   "configuration": "dartkb-mixed-linux-release-x64",
   "suite": "samples",

Combing this information with that of the two builds, I see this timeline:

2019-10-23 20:51:57 UTC: kallentu pre-approves the failures.
2019-10-23 21:48:47 UTC: kallentu's commit lands.
2019-10-23 23:44:39 UTC: Build 5537 begins containing kallentu's commit.
2019-10-24 00:00:09 UTC: kallentu's pre-approvals are applied and made real approvals.
2019-10-24 00:12:13 UTC: ajcbik approves vm-dartkb-linux-release-x64. The approvals say the tests are supposed to fail, but the latest results are passing, so the passes are approved (undoing kallentu's approval of the failures).
2019-10-24 00:13:41 UTC: The new results are uploaded to cloud storage.
2019-10-24 00:14:54 UTC: The new passing tests are automatically approved.
2019-10-24 00:15:16 UTC: Build 5537 finishes successful because the initial approvals were used with the pre-approvals applied.
2019-10-24 00:18:24 UTC: Build 5538 begins.
2019-10-24 00:43:52 UTC: Build 5538 ends and fails because the tests are not approved because the updated approvals were used.

The issue here is that there's a race condition between publishing the applied pre-approvals and publishing the actual results. That means there's a window where the approvals don't corresponding to the latest results, and approving in that window causes the now-approvals to be reverted because the new test results are not in yet. The window spans all the time where deflaking and other results processing is happening. In this case, the window was 13 minutes.

I suggest we move publishing the applied pre-approvals to the step where we approve new passing tests. Note: We still want to apply the pre-approvals where they currently are. We just don't want to publish them at that time. That ensure the new results are always published first. There is still a bit of a race condition then between the new results being available and the approvals being updated, but that is the usual situation for when a new failing test is landed without approvals, so that case should be well handled.

dart-lang / sdk

Modified ` tests/language_2/variance/variance_*_test.dart` tests caused new failures #39084