CrabDude / trycatch

An asynchronous domain-based exception handler with long stack traces for node.js
MIT License
246 stars 17 forks source link

Add a general technical explanation of how trycatch is solving error handling #30

Closed bzuillsmith closed 10 years ago

bzuillsmith commented 11 years ago

This is both a question and a issue for adding the answer to the documentation. Forgive me if this is an innapropriate place for discussion, but I don't know where else to ask.

The documentation on domains says not to ignore errors caught by a domain, but gracefully restart or shutdown the process after finishing up other requests (in the case of a server). Of the discussions I've seen out there on Node.js error handling, many of them point to this module as a 'solution'. Does trycatch solve this problem completely? Regardless of the answer, could somone add the answer to the docs and any other related technical (but general) information on what trycatch is actually doing to handle errors under it's hood?

CrabDude commented 11 years ago

Short answer, yes.

The warning to not ignore errors in the documentation was placed there because of me. The short explanation is that when you throw, you prevent the current call stack from unwinding. Because node.js core shares the same execution thread as application code, and thus the same call stack, any uncaught thrown exception will prevent a core stack from unwinding and thus creates an unknown state (you don't know what didn't get executed) and thus requires a restart.

What trycatch does is shim all the core modules so that it can wrap all their callbacks in try/catch blocks thus insulating the core call stack from exceptions in userland (non-core code). This is exacerbated a bit because the core error handling is extremely inconsistent, and basically only passes errors to callbacks because you could catch them even if wanted to, not because it makes any distinction between exceptions and callback errors. This is a huge problem, because without a functioning try/catch (one that works asynchronously, without a severe performance hit, and allows typed catching), any exception can be a potential DoS liability.

Therefore, all exceptions must be removed or handled to prevent this DoS liability. There are several facets to this:

  1. Catch all errors thrown from core except TRESUR errors.
  2. Prevent userland exceptions from requiring a restart.
  3. Never ever ever throw errors, unless you want it to require a restart of your process (and thus a DoS liability).
  4. Catch and pass all errors to the callback.

trycatch solves all of these when used. Additionally, each trycatch invocation creates a domain, to additionally close sockets and other resources that may have been created along the way.

bzuillsmith commented 11 years ago

Cool. Thanks, that cleared a lot up for me. So do I understand this process correctly in this hypothetical situation:

So if I understand correctly, trycatch will catch the error in some manner (using a try/catch or domain maybe?) and after the error handler function, the node process will move on to the next callback in the event queue with no other problems assuming I cleaned up anything I needed to in the error callback? Nothing else really needs to be done at that point?

It probably isn't the best way to use trycatch, but just getting the understanding down.

CrabDude commented 11 years ago

Yes, that's precisely how trycatch is meant to be used.

It's also very useful on a per-request basis, which is how we use it at LinkedIn. By default it wraps all callbacks (e.g., process.nextTick callbacks) in a try/catch, but also uses a domain to ensure resource cleanup.

However, it doesn't use domains for catching errors, only for resource cleanup and because EventEmitter passes errors to the active domain (instead of throwing them). If an error is thrown, uncaught, and bubbles to the active domain, trycatch rethrows because such error cannot safely be caught due to the torn call stack (core call stack not unwound).

In other words, trycatch explicitly does not catch any errors that cannot safely be caught, allowing you to use process.on('uncaughtException') for the purpose it was intended (to delay process.exit to handle your exit gracefully).

bzuillsmith commented 11 years ago

Do you (at LinkedIn wrap a request just because you are in a better place to send a 500 or handle the error in a more reasonable way at that point? Or were there other considerations put into that decision I'm not seeing? I'm using Express and I'm not sure how to wrap requests with a try/catch using it, but it seems the logical place for handling the request well. Got any suggestions or references I can look at.

Thanks for this info. This is super helpful and explains a lot. Hope to get it into some documentation!

CrabDude commented 10 years ago

We wrap requests for 2 reasons:

  1. To send a 500
  2. To avoid node.js' crash on error design DoS liability

It's trivial to accomplish:

app.use(function(req, res, next) {
  trycatch(next, function(err) {
    response.writeHead(500);
    res.end(err.stack); // debugging
  });
});