RFC: adding built-in error handling support to zones

This is a request for comments, especially from the Node community (/cc @bmeck @trevnorris; please bring in others). Our original plan of leaving error handling out of the base zone primitive, and letting it be handled entirely by host environments or future spec extensions, is starting to show some cracks. This is mainly because it impacts the recommendations around how to "wrap" functions for zones. We would like the base zones proposal to have a strong recommendation: "to properly propagate zones while doing async work, do X."

The problem

Currently our recommendation for how to propagate zones is "use Zone.current.wrap", but @mhevery has shown me how that doesn't quite work in certain scenarios once error handling is introduced. In most cases of user-mode queuing, you want wrapped functions to send any errors to the relevant zone. However, in some cases, mainly when building control flow abstractions like promise libraries, you want to handle errors yourself.

So, if we later introduced error handling, some fraction of the uses of Zone.current.wrap would be wrong.

Similar problems apply to zone.run, since one of the important uses of run for library authors is to manually do wrapping (i.e., store the callback and current zone, then do storedZone.run(storedCallback), instead of storing the wrapped callback).

The solution

The way to fix this is to make the two use cases explicit. Instead of wrap and run, we have wrap, wrapGuarded, run, and runGuarded, where the non-guarded variants are used when the library explicitly wants to handle thrown exceptions: like for implementing promises or observables or similar, where you transform thrown exceptions into a different form.

However, it's pretty pointless to introduce these two functions if the zones proposal doesn't actually have error handling built-in. So this takes us down the path of introducing error handling into the base zone primitive, instead of saving it for a future extension.

In other words, keeping things in a future extension is fine, unless that requires you to change your code now. If you require people to change their code now, you might as well give them the benefits (zone error handling) that you're asking them to pay for.

Details of error handling proposal

The zone fork options get a handleError option, which takes a thrown error object:

const z = Zone.current.fork({
  name: 'http request zone',
  handleError(e) {
    sendAnalytics(e);
    return true; // error handled. Or, should we just make you rethrow to indicate not-handled?
  }
});

This is pretty simple. The trick is then figuring out when and how we should route errors to the error handler. To discuss that, we need to talk about the “guarded” functions introduced above.

Details of {wrap,run}{Guarded}

Currently, we have (essentially)

Zone.prototype.run = function (f) { // current
  const oldZone = Zone.current;
  setCurrentZone(this); // privileged API
  try {
    return f();
  } finally {
    setCurrentZone(oldZone);
  }
};

Zone.prototype.wrap = function (f) {
  const thisZone = this;
  return function () {
    return thisZone.run(() => f.apply(this, arguments));
  };
};

We would then introduce:

Zone.prototype.runGuarded = function (f) { // new
  const oldZone = Zone.current;
  setCurrentZone(this); // privileged API
  try {
    return f();
  } catch (e) {
    // actually stored in an internal slot, not a _-prefixed property
    if (!this._handleError || !this._handleError(e)) {
      throw e;
    }
  } finally {
    setCurrentZone(oldZone);
  }
};

Zone.prototype.wrapGuarded = function (f) {
  const thisZone = this;
  return function () {
    return thisZone.runGuarded(() => f.apply(this, arguments));
  };
};

How to think about these

The TL;DR is: most libraries use wrapGuarded. Most apps use run.

In a bit more detail: the user code and the library collaborate to figure out how errors are handled, with the following inputs:

Does the library know how to handle errors? If so, it will use wrap (or run), in combination with a try/catch at the call site. Otherwise (most cases, like setTimeout or event listeners), it will use wrapGuarded (or runGuarded), to say: “I don’t know how to handle the error, and if I let it propagate up the call site it would just immediately reach top level, so instead let’s route errors to the right zone.”
Does the user code want the zone to handle errors? If so, it will use runGuarded to kick things off. Otherwise, it will use run, and let the error propagate up the call stack as usual. Using runGuarded in this way is fairly unusual; it’s a kind of “async try/catch” and programs are generally better served by letting the error bubble.
Example in action

This example shows how, if you follow the TL;DR above, everything “just works”:

/////// LIBRARY CODE

sql.addListener = function (f) {
  this.storedFunction = Zone.current.wrapGuarded(f);
};

sql.doStuff = function () {
  this.storedFunction();
};

/////// APP CODE

const rootZone = Zone.current;

const zone1 = rootZone.fork({
  handleError: handleError1
});
const zone2 = rootZone.fork({
  handleError: handleError2
});

zone1.run(function a() {
  sql.addListener(function b() {
    rootZone.run(function c() {
      throw new Error("boo");
    });
  });
});

zone2.run(function d() {
  setTimeout(function e() {
    sql.doStuff();
  }, 0);
});

At the time the error is thrown, the call stack is:

c (top)
rootZone.run
b
this.storedFunction (wrapper around b to run it in zone1)
sql.doStuff
`e
wrapper around e to run it in zone2 (generated by setTimeout)
setTimeout task (bottom)

Error propagation and handling then occurs like so:

The error is not caught by rootZone.run (run does not handle errors at all).
The error is next caught by this.storedFunction, which is a wrapper around b to run it guarded in zone1. That sends it to errorHandler1.
If errorHandler1 doesn't return true, the error is next caught by the wrapper around e to run it in zone2. So it's sent to errorHandler2.
If even errorHandler2 doesn't try it, we call the error unhandled, and it goes to window.onerror or "uncaughtException" as usual.
Issues to discuss

Most importantly: does this sound like something that is acceptable to potential zone-using communities? We’d like to have everyone on board, and we spent a lot of time trying to get the details right here (drawing on things like the domain module postmortem from Node.js), so hopefully it’s not that bad.

Less important issues:

How should unhandled/handled be done? I did returning true for handled, false for unhandled, but I think another plausible design is that it’s handled by default, and you rethrow the error if you want it to be unhandled. (Assume for the purposes of discussion that we properly specify Error.prototype.stack as being captured at error creation time, not at error throwing time, so rethrowing does not hurt stack information.)
Should we consider only adding run + wrapGuarded, since those are the “happy path”? I think it’s weird to have an asymmetric API like that, and there are definitely use cases for the other two. But on the other hand, you can derive runGuarded from wrapGuarded, and derive wrap from run, so it’s not necessary.

@domenic that only tells if the zone has a parent, zones created like https://domenic.github.io/zones/#sec-zone with a parent option of null also have .parent == null. I think having .root defer to a Realm root zone would be a preferred way to ensure the root zone is still in place even when you want to escape the current stack of zones.

Right, as I said, there can currently be multiple root zones, and maybe we should disallow that.

@domenic is there a case for multiple root zones? I would consider root zones as a realm intrinsic personally.

Yeah, there probably isn't; it fell out naturally of the specification (and from my desire not to allow the user agent to do magic things). Happy to fix.

@mhevery Despite my best efforts, I've failed to communicate my concerns. I'll try again, and be as clear and concise as I can be (which doesn't say much). First I'd like to address a few points:

I am a bit confused about your first example. You set up z1 which swallows exceptions, and then you are concerned that exceptions get swallowed?

This must be my failing. When I stated "the module I used" and "[t]here's never a good reason for a library to handle an application's exception" thought it would be apparent that mlib.js was a third-party module, and one I had control over. Apologies.

In my mind the only time a library should fork or capture a zone is if it does user queue operations. (Such as implementing work queue)

That's great you have an idea of how they should be used, but that says little of how they will be used. Unless you enforce behavior via the syntax, users will do many strange things. For example, tj/co uses generators to achieve async/await like behavior. Did anyone on the standards body consider this as a use case for generators? (this isn't rhetorical, I am curious)

Could we move this discussion from hypothetical here-is-how-i-can-break-zones to concrete use cases where zones fails to perform as expected.

Thus far nothing I've said has been simply an attempt to break the spec. It is all based on use cases I've seen in the wild. Please take my examples from this point-of-view. If I'm trying to break the spec with edge case or unrealistic scenarios I'll explicitly say so.

Now for another code example that'll hopefully explain some of my reservations. The following spec assumptions were made: 1) Users can no longer use new Zone(). 2) Zone.root points to the root Zone. 3) Zones propagate through calls to .on() (as clarified by @domenic in https://github.com/domenic/zones/issues/9#issuecomment-218885727)

Now the example code:

// module.js
module.exports = getFile;

function getFile(path, callback) {
  const gfz = Zone.root.fork({
    name: 'module forked Zone.root',
    handleError: () => true,
  });
  gfz.callback = callback;
  gfz.zone = Zone.current;
  gfz.runGuarded(() => {
    require('fs').readFile(path, moduleCallback);
  });
}

function moduleCallback(er, data) {
  const gfz = Zone.current;
  gfz.zone.run(() => {
    gfz.callback(er, data);
  });
}

// main.js
const getFile = require('./module');

require('net').createServer((c) => {
  const nz = Zone.root.fork({ name: 'app forked Zone.root' });
  nz.connection = c;
  nz.run(() => {
    c.on('data', connectionData);
  });
}).listen(8080);

function connectionData(chunk) {
  getFile(chunk.toString(), fileGotten);
}

function fileGotten(er, data) {
  if (er) throw er;

  const nz = Zone.current;
  nz.connection.end(data);
}

main.js is my application code. It forks from Zone.root on every 'data' event and makes a call to getFile(). main.js uses Zones to propagate the connection object thereby removing the need for nested functions. It does not wish to handle exceptions, and wishes for those exceptions to bring down the application.

module.js is third-party library code that my application is using. It uses Zones as a way to handle errors from any system calls it performs (note: the blatant error handling behavior is only for demonstration). It has no need of handling anyone else's exceptions. module.js is a good citizen and makes sure to call the fileGotten() callback within the Zone getFile() was invoked under.

Execution steps:

connectionData() is called from main.js, within a Zone created for the unique connection, where getFile() is called from module.js.
getFile() creates a new Zone for the individual request in order to handle any exceptions thrown while performing system calls.
When the operation completes moduleCallback() is called with the results.
Using the same Zone that was used to call getFile(), moduleCallback() uses Zone.current.zone to run() the callback. In order to restore the previous Zone state as when the callback was made.
main.js's callback, fileGotten(), is called with the results. If there was an error then main.js will rethrow.
Any exceptions from fileGotten() will bubble, unhandled, to its root Zone and then be rethrown.
The exception will then bubble to the errorHandler() defined in module.js and be silenced.

Scenario: I know my application is throwing, but can't find where the exception is being swallowed. Causing me angst.

Challenge 1: Find where the exception is being swallowed, only being allowed to touch the code in main.js. This is meant to be a very simplified situation where usually an application would have many modules, and viewing/editing dependencies could prove to be overly difficult.

Challenge 2: How can the author of module.js synchronously run the application's callback in a way that removes its own Zone from the stack. Such that getFile()'s errorHandler won't be invoked if the application's callback throws? (note: synchronous part is key if EventEmitter is to support Zones)

Point: If a third-party module was swallowing my exceptions with uncaughtException it could be easily enough found with the following snippet:

process.on('newListener',
           (n) => n === 'uncaughtException' && console.log((new Error()).stack));

If a third-party module wants complete error handling for only the duration of its execution the module can call process.removeListener('uncaughtException', fn) before calling the user's callback in order to prevent swallowing errors it doesn't care about. i.e. removing this error handling is trivial.

While debugging my application or troubleshooting a module all usage uncaughtException can be subverted by simply including process.on('uncaughtException', (e) => { throw e }) at the top of my application.

Simply put, if I as the application's author don't want any module to silence my exceptions I have the APIs to make sure that happens. As far as I understand the current Zone's spec, a third-party module could swallow my exception into the abyss. Making it near impossible for me to find or debug.

Mmmmm our meeting schedule doesn't fully align. I can say we either split it into 2 meetings, or wait another week.

I am sorry about a delayed response.

My understanding of your concern is that exceptions can be swallowed by an uncooperating library.

function libA() {
  var logExceptionsZone = Zone.current.fork({
    handleError: (e) => console.log(e)
  })
  logExceptionsZone.runGuarded(() => {
    libB() => throw new Error('What will happen to me?'));
  });
}

function libB(callback) {
  var eatExceptions = Zone.current.fork({
    handleError: () => true
  })
  eatException.runGuarded(callback);
}

libA();

In the above case libA wants to log exceptions, but when it calls into libB all exceptions get swalowed. This danger already exists today with try catch. Becasue try catch is already awailable, from our point of view Zones don't introduce anything new.

try-catch is scoped to stack frames, just like zones are scoped to stack frames. So unlike Domains it is not possible to enter but forget to exit a zone. Also zones are nested like stack frames so exeception handling unwraps like stack as well. Very analogous.

Same thing using try-catch

function libA() {
  try {
    libB() => throw new Error('What will happen to me?'));
  } catch (e) {
    consol.error(e);
    throw e;
  }
}

function libB(callback) {
  try {
    callback();
  } catch (e) {
  }
}

libA();

Yes, libB swalows exceptions and there is no way that libA could know about it.

A better way to write this would be:

function libA() {
  var logExceptions = Zone.current.fork({
    handleError: (e) => console.log(e)
  })
  logExceptions.runGuarded(() => {
    libB() => throw new Error('What will happen to me?'));
  });
}

function libB(callback) {
  var storeZoneBoundCallback = Zone.current.wrapGuarded(callback);
  var eatExceptions = Zone.current.fork({
    handleError: () => true
  })
  eatException.runGuarded(storeZoneBoundCallback);
}

libA();

Notice: that libB uses wrapGuarded and then just invokes it.

But I think don't think that is right either. I think there should only be run and wrapGuarded (There should not be runGuarded and wrap). So let's rewritie one more time.

function libA() {
  var logExceptions = Zone.current.fork({
    handleError: (e) => console.log(e)
  })
  logExceptions.run(() => {
    libB() => throw new Error('What will happen to me?'));
  });
}

function libB(callback) {
  var storeZoneBoundCallback = Zone.current.wrapGuarded(callback);
  var eatExceptions = Zone.current.fork({
    handleError: () => true
  })
  eatException.run(storeZoneBoundCallback);
}

libA();

The reason for this change is that code should be devided into two categories. My code, and callbacks I got from someplace else (not my code). When executing code execeptions should be handled using try-catch, because the code should handle its own execeptions. It is only when executing other code the exceptions have no meaning to my code and should be handled by the zone.

In the above example when callback got passed to libB and it is at that that libB should say, "Hey this is not my code, let me wrap to insulate myself from its exceptions", hence wrapGuarded.

Also becasue we have removed the runGuarded what we are saying is that synchronous exceptions should be handlede synchronously and only async (the ones which have wrapGuarded should have the exceptions handled.) The above example is strange because we are wrapping callback but executing in sync.

The rules are:

1) runGuarded does not exist, only run which means that entering zone does not mess with synchronous execeptions (regardless of wether the zone spec swalows exceptions). run only means enter a zone, don't catch exceptions.

2) wrap does not exist, only wrapGuarded. Call wrapGuarded only if you will store the callback for later invocation (async). As long as Zone.current.wrapGuarded(callback) gets called as soon as external callback enters a library then the correct zone will handle the callback. Doing so will not only be wrong with respect to exceptions handling, but also wrong in semantics as in the code will restore the wrong zone.

@mhevery

This danger already exists today with try catch. Becasue try catch is already awailable, from our point of view Zones don't introduce anything new.

I'm missing how it's not plain to see that try catch isn't the same. Take the following example:

try {
  fs.writeFile(path, data, err => {
    throw new Error('Nothing gonna catch me!');
  });
} catch (e) { }

No automated catch propagation. Can we all agree on this important difference?

Also zones are nested like stack frames so exeception handling unwraps like stack as well. Very analogous.

Sure it's analogous, but in practical terms it's very different. The Zone will automatically propagate the try catch like behavior everywhere, through all asynchronous time.

Yes, libB swalows exceptions and there is no way that libA could know about it.

And you don't see a problem with that?

Finally, neither of my challenges were addressed. 1) How am I supposed to locate where an exception of mine is being swallowed by a module. 2) How can I synchronously execute a callback where the root Zone is the only Zone in the stack. Example for asynchronous callback execution:

fs.writeFile(path, data, err => {
  // Say "callback" is a user supplied callback in an
  // above scope.
  Zone.root.fork().run(() => setImmediate(() => callback(err)));
});

domenic / zones