dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.23k stars 1.58k forks source link

`_NativeSocket.startConnect` is possibly throwing `SocketException`s that cannot be caught with `try` #56566

Closed andrewkolos closed 2 weeks ago

andrewkolos commented 2 months ago

Origin: https://github.com/flutter/flutter/issues/153064#issuecomment-2305662791.

On Flutter 3.24.1, I'm seeing many reports of this crash:

SocketException: SocketException: Connection attempt cancelled, host: localhost, port: 51225
at _NativeSocket.startConnect(socket_patch.dart:721)
at _RawSocket.startConnect(socket_patch.dart:1920)
at RawSocket.startConnect(socket_patch.dart:27)
at Socket._startConnect(socket_patch.dart:2144)
at Socket.startConnect(socket.dart:825)
at _ConnectionTarget.connect(http_impl.dart:2491)
at _HttpClient._getConnection.connect(http_impl.dart:2930)
at _HttpClient._getConnection(http_impl.dart:2935)
at _HttpClient._openUrl(http_impl.dart:2790)
at _HttpClient.getUrl(http_impl.dart:2632)
at ChromeConnection.getUrl(webkit_inspection_protocol.dart:108)
at ChromeConnection.getTabs(webkit_inspection_protocol.dart:53)
at ChromeConnection.getTab(webkit_inspection_protocol.dart:90)
+ at Chromium.close(chrome.dart:512)
at ChromiumDevice.stopApp(web_device.dart:164)
at FlutterDevice.exitApps(resident_runner.dart:377)
at ResidentWebRunner.exitApp(resident_web_runner.dart:729)
at ResidentRunner.exit(resident_runner.dart:1312)
at <asynchronous gap>(async)
at AppDomain.stop.<anonymous closure>(daemon.dart:942)
at <asynchronous gap>(async)
at Domain.handleCommand.<anonymous closure>(daemon.dart:297)
at <asynchronous gap>(async)

The interesting bit of the stack trace starts with Chromium.close(chrome.dart:512) (highlighted above).

If we look at this call site on the 3.24.1, we see this (link):

      try {
        tab = await chromeConnection.getTab(
            (_) => true, retryFor: const Duration(seconds: 1));
      } on SocketException {
        // Chrome is not responding to the debug protocol and probably has
        // already been closed.
      }

Here we see that we wrap the (awaited) ChromeConnection.getTab call with try and catch and discard any SocketException. My question is then, why are we still seeing this crash? Looking through the code of package:webkit_inspection_protocol that we see in this stack trace, I don't see any potential for unawaited futures that are throwing this exception asynchronously. This leads me to suspect that this is an issue in the Dart http library. Maybe this exception is being thrown asynchronously (but we are stilling seeing a stack trace because Error.throwWithStackTrace is being utilized).

My sincerest apologies in advance if this this turns out to not be a bug in the Dart sdk.

dart-github-bot commented 2 months ago

Summary: The _NativeSocket.startConnect method in the Dart SDK is potentially throwing SocketExceptions that cannot be caught with a try block. This issue is causing crashes in Flutter 3.24.1, particularly when closing a Chromium instance. The stack trace suggests that the exception is being thrown asynchronously, but the code appears to handle SocketExceptions. The issue might be related to the Dart http library or an unhandled asynchronous exception.

a-siva commented 2 months ago

//cc @brianquinlan

brianquinlan commented 2 months ago

The exception is coming from _NativeSocket.tryConnectToResolvedAddresses.

Probably the task is cancelled from a timeout in _ConnectionTarget.connect ...

...or not since the HttpClient is configured without a timeout

ConnectionTask.cancel() is also called by _ConnectionTarget.close()

FWIW, chromeConnection.close (which calls HttpClient.close(force: true), does not seem to be called with any pending connections.

aam commented 2 months ago

Note that unhandled exceptions will go to the zone where the future that throws async exception was created in. (https://dart.dev/libraries/async/zones#handling-uncaught-errors) In this case it seems to be ChromeConnection created in launch method. ChromeConnection itself creates HttpClient instance, which if throws asynchronously is expected to be handled by the the zone encompassed ChromeConnection creation. What is that zone? Does it have exception handler? If it does not, then the async exception thrown by HttpClient will terminate the app.

brianquinlan commented 2 months ago

The HttpClient constructor doesn't create any Futures.

brianquinlan commented 2 months ago

I can see two ways that this exception could occur:

  1. _ConnectionTarget.close() is called.

This call chain looks like:

HttpClient._closeConnections() HttpClient.close()

  1. _HttpClientConnection.connect

This can only be invoked when connectionTime != null, which I don't think is the case.

So (1) seems like the most plausible.

brianquinlan commented 2 months ago

It is not obvious to me whether or not ChromeConnection.close can be called while there are requests in flight.

@andrewkolos

For example, could ChromiumDevice.stopApp() be called will while there are requests in flight?

andrewkolos commented 2 months ago

It is not obvious to me whether or not ChromeConnection.close can be called while there are requests in flight.

@andrewkolos

For example, could ChromiumDevice.stopApp() be called will while there are requests in flight?

I wonder if https://github.com/flutter/flutter/issues/154203 (edit[^1]) and https://github.com/flutter/flutter/issues/154207 suggest that this is the case.

[^1]: never mind, this is happening because we are sending off a get request after the connection has been closed

andrewkolos commented 2 months ago

I've found (an inorganic) way to repro this by forcibly shutting down Chrome before the flutter tool gets to this code: https://github.com/flutter/flutter/issues/153064#issuecomment-2314115350. Notably, this repro goes away with the use of asyncGuard.

brianquinlan commented 1 month ago

Using asyncGuard seems like a good way to deal with this class of errors. Were you able to determine whether HttpClient.close() could be called with there are requests outstanding?

andrewkolos commented 1 month ago

Were you able to determine whether HttpClient.close() could be called with there are requests outstanding?

I don't believe this is the case, though I cannot prove this. The only possible call of HttpClient.close is https://github.com/flutter/flutter/blob/cb28dec614e481c5ebde52b545118202200c5d4c/packages/flutter_tools/lib/src/web/chrome.dart#L525 (the call to ChromeConnection.close after we've sent off the command to close Chrome).

The root cause of this crash was that flutter was trying to close Chrome twice, concurrently (hence why Chrome was already closed when we sent off the second command to close it). Knowing this, I repro'd the issue locally using https://github.com/flutter/flutter/issues/153064#issuecomment-2314115350, this time instrumenting the code with prints to see if see if we call ChromeConnection.close before sending off the second close command. However, this appeared not to be the case—both close commands were sent off before the tool ever called ChromeConnection.close. The tool crashed before ChromeConnection.close was ever invoked.

FWIW the crash has gone away since asyncGuard was implemented, so I wouldn't be terribly upset if this was closed as can't repro/won't fix.

a-siva commented 2 weeks ago

Closing this issue.