dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.04k stars 1.55k forks source link

`dart test --coverage`: getSourceReport: (-32000) Service connection disposed #56254

Closed rrousselGit closed 2 weeks ago

rrousselGit commented 1 month ago

Hello! I am facing an unexpected getSourceReport: (-32000) Service connection disposed error when running dart test --coverage=folder.

Unfortunately there's not much extra information. The full dart test log I have is:

✅ test/firebase_admin_app_test.dart: FirebaseAdminApp initializeApp() creates a new FirebaseAdminApp
✅ test/firebase_admin_app_test.dart: FirebaseAdminApp useEmulator() sets the apiHost to the emulator
getSourceReport: (-32000) Service connection disposed

This is an unexpected error. Please file an issue at http://github.com/dart-lang/test
with the stack trace and instructions for reproducing the error.
Unhandled exception:
getSourceReport: (-32000) Service connection disposed

Here is a Github Action log with the issue: https://github.com/invertase/dart_firebase_admin/actions/runs/9906073779/job/27366938724

cc @DanTup

dart-github-bot commented 1 month ago

Summary: The user is encountering a getSourceReport: (-32000) Service connection disposed error when running dart test --coverage which prevents coverage reports from being generated. The error occurs after successful test execution and appears to be related to a service connection issue.

DanTup commented 1 month ago

Looking at the full error, I guess this should be filed at http://github.com/dart-lang/test (although I also wonder if the issue could be in https://github.com/dart-lang/coverage).

Assuming it's going through the code here, it seems like the VM is set to pause isolates on exit:

https://github.com/dart-lang/coverage/blob/044b7a27aecc183f204b25dd74c23d911343f9fb/lib/src/run_and_collect.dart#L18

Then it keeps polling until all isolates are paused, collects coverage, then resumes the isolates. This seems fairly sound to me, though I wonder if the VM crashes if it might result in this (for example, could it run out of memory collecting coverage, and would it fail in this way?).

Perhaps @natebosch or @jakemac53 are more familiar with how this works and know where best to file this and/or how to collect more info that might help identify the issue.

natebosch commented 1 month ago

It looks like this exception is coming from package:vm_service. @bkonyi might be able to give some help on the cause of the exception.

I don't have context for whether package:coverage can ignore errors from getSourceReport. It looks like SentinelException is expected and causes us to skip a given isolate, but I don't know whether it is safe to continue after other exceptions like this one. @liamappelbe might be able to help with that question.

This doesn't look like something that is specific to package:test.

DanTup commented 1 month ago

It looks like this exception is coming from package:vm_service. @bkonyi might be able to give some help on the cause of the exception.

The exception should only occur when the VM Service/DDS is shutdown/disconnected. Since the issue was reported against master it's possible it was affected by the recentfix to https://github.com/dart-lang/sdk/issues/55559. It's not clear to me though whether the VM Service is terminating unexpectedly (if it crashes, would there be any output from dart test to indicate this?) of if test/coverage is trying to fetch coverage info later than it should. Nor do I know the best way to collect additional info to help track this down.

bkonyi commented 1 month ago

As Danny stated, that looks related to the fix for #55559. The service connection is being destroyed due to a crash, the VM shutting down, or a network issue while the getSourceReport RPC is outstanding.

liamappelbe commented 1 month ago

dart test --coverage runs several commands under the hood. If there is a VM crash, my guess is we're not seeing the stack trace because package:test is running the test in a subprocess. In that case, the way to debug this is to run those commands manually so that any VM crashes are dumped to the terminal, or for the commands to be run in gdb. Something like this:

dart run --pause-isolates-on-exit --disable-service-auth-codes --enable-vm-service=8181 test

# In a separate terminal
dart pub global run coverage:collect_coverage --wait-paused --uri=http://127.0.0.1:8181/ -o coverage/coverage.json --resume-isolates

It would also be good to figure out why package:test isn't forwarding the crash logs to the terminal and fix that if possible.

DanTup commented 1 month ago

I forked the repo and could reproduce the issue, so I tried splitting the commands in various ways, most recently like:

set -e -m

dart pub global activate coverage

firebase emulators:start --project dart-firebase-admin --only firestore,auth &

dart run --pause-isolates-on-exit --disable-service-auth-codes --enable-vm-service=8181 test &

jobs -l

dart pub global run coverage:collect_coverage --wait-paused --uri=http://127.0.0.1:8181/ -o coverage/coverage.json --resume-isolates --scope-output=foo &

jobs -l

fg %2

Unfortunately I haven't been able to reproduce the issue with the commands separate. However, it seems there are many more tests running now (the original cashes only show a few tests), which does make me think that something is crashing (and therefore all of the tests are not being run), but the crash is not reported correctly.

I'm not sure how to repro though - I don't know if the failure only occurs when run through firebase emulators:exec of it it's just coincidence that it only failed in my first run that way).

DanTup commented 1 month ago

I did notice if I just put exit(1) in a test then dart test never reports that the process exited a) without completing or b) with an exit code.. Not sure if that's a bug? (maybe if it was reporting non-zero exit codes, it might at least be clearer if there is a crash here?)

natebosch commented 1 month ago

Not sure if that's a bug? (maybe if it was reporting non-zero exit codes, it might at least be clearer if there is a crash here?)

It's undesirable, but it's also not something we can do anything about from within Dart code in the test runner. https://github.com/dart-lang/test/issues/2248#issuecomment-2216278541

@bkonyi - do you have any thoughts on changing the behavior of the exit() API for the dart test command?

I did notice if I just put exit(1) in a test then dart test never reports that the process exited a) without completing or b) with an exit code..

I think you need exit(0) for this. If you exit(1) then it should have the failing exit code.

DanTup commented 1 month ago

Oops, I think I for confused along the way. I wanted to see if the crash would be noticed, but ofc pkg:test is running inside the same process, so the VM terminating with an exit code is not something test can detect (and if dart test is running pkg:test directly in the same VM and not spawning another process, I guess the same applies there).

I'm not sure why I can't repro the error without the firebase emulators:exec.. I don't know if it's causing something to crash. I did try running a script that just throws to ensure that it is passing stderr and exit codes on correctly (so firebase is not swallowing anything) and that seems to be the case:

✔  firestore: Firestore Emulator UI websocket is running on 9150.
i  Running script: dart run bin/throw.dart
Unhandled exception:
Oops
#0      main (file:///home/runner/work/dart_firebase_admin/dart_firebase_admin/packages/dart_firebase_admin/bin/throw.dart:2:3)
#1      _delayEntrypointInvocation.<anonymous closure> (dart:isolate-patch/isolate_patch.dart:297:19)
#2      _RawReceivePort._handleMessage (dart:isolate-patch/isolate_patch.dart:184:12)
⚠  Script exited unsuccessfully (code 255)
i  emulators: Shutting down emulators.
i  firestore: Stopping Firestore Emulator
i  auth: Stopping Authentication Emulator
i  hub: Stopping emulator hub
i  logging: Stopping Logging Emulator
Error: Script "dart run bin/throw.dart" exited with code 255
Error: Process completed with exit code 255.

Edit: I was wrong, I did manage to reproduce without dart test being inside the firebase command.

The output was no different though:

i  firestore: Firestore Emulator logging to firestore-debug.log
✔  firestore: Firestore Emulator UI websocket is running on 9150.
✅ test/firebase_admin_app_test.dart: FirebaseAdminApp initializeApp() creates a new FirebaseAdminApp
✅ test/firebase_admin_app_test.dart: FirebaseAdminApp useEmulator() sets the apiHost to the emulator
getSourceReport: (-32000) Service connection disposed
This is an unexpected error. Please file an issue at http://github.com/dart-lang/test
with the stack trace and instructions for reproducing the error.
Unhandled exception:
getSourceReport: (-32000) Service connection disposed
⚠  It appears you are running in a CI environment. You can avoid downloading the Emulator UI repeatedly by caching the /home/runner/.cache/firebase/emulators directory.
i  ui: downloading ui-v1.12.1.zip...

Based on there only being two test results, it does seem like some kind of crash that dart test isn't reporting, but I don't think I was accurately translating this into the commands that dart test runs, because I wasn't able to repro with the separate commands linked to above.

DanTup commented 1 month ago

I added some debug logging to a fork of pkg:test and also this anywhere isolates were spawned (+ main isolate):

isolate.setErrorsFatal(false);
isolate.errors.listen((e) => print('MAIN: $e'));

And now the "crash" no longer occurs and everything completes, but at the start of the output I see this:

✔  firestore: Firestore Emulator UI websocket is running on 9150.
Danny's custom version of test
Could not start the VM service:
DartDevelopmentServiceException: Failed to start Dart Development Service
running root group
running child group FirebaseAdminApp...

It's not clear to me if this error is the cause (or a symptom of the same issue). It's also not clear to me why this error didn't seem to be printed when coverage isn't enabled, given coverage (as far as I can tell) doesn't change anything until the end of the suite when it tries to collect (as far as I can tell, it's collecting coverage at the end of each suite? maybe because it's per-isolate?)

natebosch commented 1 month ago

given coverage (as far as I can tell) doesn't change anything until the end of the suite when it tries to collect (as far as I can tell, it's collecting coverage at the end of each suite? maybe because it's per-isolate?)

The runner does collect coverage per-isolate at the end of each suite.

It also does connect the VM service before the suite runs whenever coverage is enabled. https://github.com/dart-lang/test/blob/2c2ac49de0e9d6ea9744c6e7990d5730806410c0/pkgs/test_core/lib/src/runner/vm/platform.dart#L123

When coverage is enabled, config.debug is true.

natebosch commented 3 weeks ago

This is starting to impact Dart team owned repos. https://github.com/dart-lang/yaml_edit/actions/runs/10232023033/job/28368166876#step:5:24

@bkonyi @liamappelbe - were there any other changes to the VM service or coverage collection recently?

liamappelbe commented 3 weeks ago

There haven't been any significant changes to coverage. How flaky is this? Can we git bisect the VM?

DanTup commented 3 weeks ago

FWIW I can repro this locally (from a terminal with dart test --platform vm --coverage=./coverage) using the yaml_edit repo noted above. When I hit it, I see the same "Failed to start Dart Development Service" error (although I'm not sure whether that's the cause):

image

I tried tracing through the code trying to understand where the VM is being kept alive (the only --pause-isolates-on-start I could see was in bin/test_with_coverage.dart but I added a print() in there and don't see it in the output, so I don't know if it's using that). I'm also unfamiliar with the pkg:test code so I couldn't tell what code runs in what process (for example is pkg:test_core's Engine.run() running in the process that is being collected, or is it spawning another process to run the tests? If it's the same process, it's odd for the VM Service to go away while code is still being executed here, but if it's another process I'm struggling to see where it's launched).

DanTup commented 3 weeks ago

Also FWIW, I only seem to be able to repro when there is more than one suite. If I put any suite name on the end from that repo, it always works. If I run without a suite name, even after deleting all but two suites (eg. windows and wrap), then the failure occurs. This may just be coincidence because it's a timing issue, but maybe useful info 🤷‍♂️

liamappelbe commented 3 weeks ago

@bkonyi I did a git bisect, and it looks like the culprit is https://dart-review.googlesource.com/c/sdk/+/375000. The bug repro's reliably when running dart test --platform vm --coverage=./coverage in yaml_edit.

bkonyi commented 3 weeks ago

Thanks for bisecting @liamappelbe! I'll take a look today.

bkonyi commented 2 weeks ago

There was a race condition where multiple calls to Service.controlWebServer in quick succession would result in the underlying VM service URI being returned while DDS was still starting. We'd establish connections to the VM service, only to have them closed once DDS invoked _yieldControlToDDS which causes the VM service to close all direct web socket connections, which is why we were seeing the "Service connection disposed" error.

Fix up for review: https://dart-review.googlesource.com/c/sdk/+/379700