dart-lang / shelf

Web server middleware for Dart
https://pub.dev/packages/shelf
BSD 3-Clause "New" or "Revised" License
926 stars 125 forks source link

Suggestion: always run serve_requests in `runZoneGuarded` #202

Open annagrin opened 3 years ago

annagrin commented 3 years ago

serveRequests does not always wrap listening to http requests in a runZoneGuarded wrap:

https://github.com/dart-lang/shelf/blob/6e38575985a8094f4ac172228a6711196d9ae91e/lib/shelf_io.dart#L73 https://github.com/dart-lang/shelf/blob/6e38575985a8094f4ac172228a6711196d9ae91e/lib/src/util.dart#L16

This leads to hard to debug async exceptions later if something up its call chain adds its own runZoneGuarded wrapper - in that case exceptions stop being caught and ignored inside serveRequests and fall to the next root zone exception handler, which unexpectedly changes to logic of the calling code. Note that adding such top level handler will change behavior of all serveRequests calls, including the ones coming from imported libraries, which makes fixing those new issues particularly complicated.

Can we always run code inside serveRequests in a runZoneGuarded, and provide it with an error handler instead for the cases where error behavior needs to be controlled by the user?

kevmoo commented 3 years ago

I think I buy that. Thoughts @natebosch @jakemac53 @lrhn ?

kevmoo commented 3 years ago

@natebosch looked at pulling out the error handling to do something different during the null safety push and we never got anywhere.

Might be worth revisiting, though...

jakemac53 commented 3 years ago

Do we know what the rationale for the existing behavior was? It does seem problematic to me, but it also seems like somebody went to a fair bit of effort to make it behave that way?

kevmoo commented 3 years ago

@nex3 had good reasons, I'm sure – perhaps she can shed some light?

natebosch commented 3 years ago

Yeah the current behavior is magic and confusing - the hidden interactions with error zones does cause problems.

https://github.com/dart-lang/shelf/issues/2#issuecomment-742081585

My preference is to remove anything related to error zones in this package, but we'd need to be careful rolling it out.

kevmoo commented 3 years ago

The OTHER idea.

Chatting with a community member about this.

Deprecate shelf_io.dart and create another, more opinionated package to handle the "wire up the server" bits.

annagrin commented 3 years ago

I am good with anything that does not change behavior if a top level handler is added, we would need to update a bunch of existing libraries in that case though. My current case - ddr has a top level handler that has been added to collect crash metrics, but it runs a bunch of http servers via libraries such as dds, dwds, devtools, build daemon etc.

nex3 commented 3 years ago

If I recall correctly, the idea behind the current error handling was to be a backstop in the case where the user hadn't thought about error handling at all. Crashing the server on an error is about the worst possible behavior, so the current behavior says "if the server would otherwise crash, log an error instead, otherwise leave it up to the user". The expectation was that most servers would do more thorough error-handling, either by adding middleware that captured errors or by having some app-wide error handler (that is to say, running the server in a pre-existing error zone).

I think the root issue here may be that it wasn't clear enough that users should set up their own explicit error-handling. You could solve this in a non-breaking way by adding a message to the default error log saying "set up better error handling", and possibly also providing a built-in error-handling middleware.

annagrin commented 3 years ago

Unfortunately it introduced the opposite behavior from the desired one in this case - if all serveRequests calls in all the libraries are used unwrapped and then someone adds a top level wrapper, server starts effectively crashing, because the top level zone was just added as a way to catch anything, print in nicely to the screen and exit. So it looks like the only safe way to use this API in a library is by wrapping it in runZoneGuarded with error handling, because there is always a chance for that top wrapper addition in the calling code later.

So it looks to me that the current API is incomplete, because it does not require error handling.

Maybe we can introduce another one that does not change behavior depending on calling code?

We can make it very similar to serveRequests but add an error handler that is passed to the zone that serveRequests unconditionally runs in. We can set the default error handler to do what it currently does - print 'Async error' and exit.

Can even eventually mark the old one as deprecated if needed.

annagrin commented 3 years ago

@nex3

You could solve this in a non-breaking way by adding a message to the default error log saying "set up better error handling", and possibly also providing a built-in error-handling middleware.

This would be great, but in this case we need to inform the user where the error came from. The errors are asynchronous and look like "SocketException: OSError: broken pipe" with an unrelated stack at best.

Maybe we can remember the stack at the call to serveRequests and add it to the exception to propagate to the lower runZoneGuarded?

There is also a question how to handle those errors coming from other libraries, where running an API that starts a server in runZoneGuarded is too late to properly recover from broken socket errors.

nex3 commented 3 years ago

This would be great, but in this case we need to inform the user where the error came from. The errors are asynchronous and look like "SocketException: OSError: broken pipe" with an unrelated stack at best.

Are you using stack chains in your global error handler? That can go a long way towards helping to clarify the real source of an error.

The trick here is that you want to balance three things:

  1. Having graceful error-handling behavior by default if the user doesn't configure anything special.
  2. Allowing the user to configure their own totally custom error handling.
  3. Minimizing nonlocal behavior where a global error handler affects the way a server's errors are handled.
annagrin commented 3 years ago

Are you using stack chains in your global error handler?

Thanks, I wasn't, added them to wrap serveRequests call.

Agree with all 3 items!

Btw, what would be the best default error handling in the case of serveRequests? Does always ignoring them work for both the server and the client? We have hard to reproduce socket exceptions happening intermittently, and inducing them manually in the socket code and always ignoring them in the runZonedGuarded wrapper sometimes leaves the server unresponsive.

nex3 commented 3 years ago

I'd say ignoring errors is almost always the wrong behavior. I'd say either always logging, or always throwing a nicely-formatted error is probably the best option.

annagrin commented 3 years ago

Makes sense, I meant logging and ignoring in this case. I wonder if there is any way to filter errors that are likely to render the server unusable in which case we probably should throw.

annagrin commented 3 years ago

Linked all cases that can potentially contribute to ddr crashes.

annagrin commented 3 years ago

Fix I used in dwds and webdev:

/// Handles [requests] using [handler].
/// 
/// Captures all sync and async stack error traces and passes
/// them to the [onError] handler.
void serveHttpRequests(Stream<HttpRequest> requests, Handler handler,
    void Function(Object, StackTrace) onError) {
  return Chain.capture(() {
    serveRequests(requests, handler);
  }, onError: onError);
}
jakemac53 commented 3 years ago

Should we just add a boolean parameter here that forces it it handle and log all async errors instead of only doing so if there is no existing custom error handling zone? forceHandleAndLogUncaughtErrors or maybe something a bit more concise?

annagrin commented 3 years ago

@jakemac53 how about the boolean flag forceHandleAndLogUncaughtErrors and a custom error handler, onError(e,s)?

kevmoo commented 3 years ago

Sketch out a PR and we can review?

On Mon, Oct 4, 2021 at 4:30 PM Anna Gringauze @.***> wrote:

@jakemac53 https://github.com/jakemac53 how about the boolean flag forceHandleAndLogUncaughtErrors and a custom error handler, onError(e,s)?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dart-lang/shelf/issues/202#issuecomment-933933123, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEFCVAKWPHQ7E2XD4D373UFI2IVANCNFSM5E6HYHSA .

jakemac53 commented 3 years ago

@jakemac53 how about the boolean flag forceHandleAndLogUncaughtErrors and a custom error handler, onError(e,s)?

Or possibly we could just add the onError handler parameter, and if that is passed then we assume you want to handle all errors?

annagrin commented 3 years ago

even better! I will submit a PR

annagrin commented 3 years ago

Sent a sketch. Let me know what you think!