defunctzombie / node-process

process information for node.js and browsers
MIT License
124 stars 62 forks source link

Handle nextTick task errors #37

Closed slorber closed 9 years ago

slorber commented 9 years ago

When draining the nextTick task queue, if there is a single task error, it is thrown and stop immediately the draining of the queue. Also it leaves the attribute draining = true, making subsequent calls to nextTick not trigger the draining again)

I think the draining should complete for all queued tasks, and errors on single tasks should not stop the other queued tasks to execute.

I choose to handle exceptions in a setTimeout as it seems to be more natural than using a logger. As exceptions are "exceptional" this should probably not be a big performance problem.

This solves serious issues that appeared with nextTick, with Browserify >= 8.1.0 (for example with the Q promise library).

See: https://github.com/kriskowal/q/issues/669 https://github.com/substack/node-browserify/issues/1179

calvinmetcalf commented 9 years ago

so the root of the error is that Q is misidentifying browserified code as being in node as opposed to the browser, like node this makes no guarantees about what an uncaught exception will do inside of a next tick.

That being said if you add a second set timeout after this line which checks if draining is true and if it is cleans up it will likely accomplish the same thing without causing a massive performance penalty.

calvinmetcalf commented 9 years ago

line 11 would be another place you could schedule it, something like

...
function cleanUpNextTick() {
    draining = false;
     queue = [];
}
function drainQueue() {
    if (draining) {
        return;
    }
    var timeout = setTimeout(cleanUpNextTick, 0);
    ...
    draining = false;
    clearTimeout(timeout);

this would make sure all pending events were booted from the queue as well

edit: was able to simplify it more

slorber commented 9 years ago

Sorry I won't be able to help more, I'm not a nodejs dev and I don't know what's the intended behavior of node's nextTick.

Just giving a solution that worked fine for me.

Imho using a cleaup timeout won't solve all problems as when registrering 2 Q handlers in then if one of them raises an error the second shoud still be called. It's not just about cleaning it's about ensuring all tasks queued will be executed, and that is the behavior that we had before with a simple setTimeout for each task

calvinmetcalf commented 9 years ago

Imho using a cleaup timeout won't solve all problems as when registrering 2 Q handlers in then if one of them raises an error the second shoud still be called. It's not just about cleaning it's about ensuring all tasks queued will be executed, and that is the behavior that we had before with a simple setTimeout for each task

since Q uses it's own task queue this likely won't break it.

The tricky thing is just figuring out the right behavior because node just crashes the process here

defunctzombie commented 9 years ago

I am open to a PR that fixes stuff up here, however I think this could be a Q issue since it is trying to be clever and identify where it is running instead of using standard js primitives like setTimeout(0).

calvinmetcalf commented 9 years ago

see #38 for my take on it, downside is that while it handles errors better it is a bit more complex

slorber commented 9 years ago

I don't know... what is the purpose of this nextTick shim?

I guess the original ambition was to permit to nodejs modules using nextTick to be able to run in the browser with Browserify right?

So the default behavior of NodeJS, both with setTimeout / nextTick is to crash whenever there is an uncaught exception as far as I understand. This permits to fail-fast, and to restart the process in a clean state.

In my opinion it does not really make sense to reproduce this behavior in the browser since we don't want the user UI to be totally unusable, and by the way this is not the behavior of setTimeout in the browser, as when an error is thrown it does not crash the browser/tab but just log an uncaught exception in the console.

The question is, why would we want to have a different behavior in the browser for nextTick? If the UI were stateless and we could recover it easily, then ok to fail fast. But UI's are stateful, so we must fail safe in all cases, and not make the UI totally unusable on any uncaught exception.

So for me it does not make sense to reproduce the fail-fast behavior of node's nextTick with the browser.

The former implementation was based on setTimeout and it worked fine right (I don't see any issue)? So I'm totally ok for perf improvements but why changing a behavior that seems to be ok for everybody until now?

process.nextTick(function() { throw new Error("error"); });
process.nextTick(function() { console.log("message"); });

The old behavior would log both the error and the message. The current behavior would only log the error (+ leaves in inconsistent state but this is will be fixed and is not my point here).

I think the old behavior is better for the browser.

calvinmetcalf commented 9 years ago

The old behavior would log both the error and the message.

that was not the case previously on all browsers as inconsistently certain browsers that used mutation observer would copy the array queue first while others would unshift, but give #38 a test as that should do what you want

slorber commented 9 years ago

@calvinmetcalf yes, it solves Q test case as it does not leave an inconsistant state and thus next ticks can still be executed.

But notice that a very much simpler implementation also solves my usecase. I really don't understand why you would want to handle errors in a planned cleanup task.

process.nextTick = function(task) {
    if ( !queue ) {
        queue = [task];
    } else {
        queue.push(task);
    }
    if ( !nextTickTimer ) {
        nextTickTimer = setTimeout(function processNextTick() {
            nextTickTimer = undefined;
            for (var i = 0, len = queue.length; i < len; i++) {
                queue[i]();
            }
        },0);
    }
};

I don't really understand why your solution is much more complex than that, am I missing something?

Also I noticed that in your cleanup task you have some logic to be able to reschedule queued tasks that could not be processed because an error was thrown in a previous task. This behavior seems fine for me. (at least it's what I understand of your code) After all, all my concern about former vs current behavior is about guaranteeing that all tasks submitted will be called, even if there are failed tasks in the current tick. I am ok for making current tick fail fast and replanning not called tasks to the next tick.

But I still think it could be much simpler. Why not using a try/catch/finally instead of using timers to handle errors and recovering?

calvinmetcalf commented 9 years ago

in you example

process.nextTick(function() { throw new Error("error"); });
process.nextTick(function() { console.log("message"); });

won't print message until process.nextTick is called some other time because nothing will restart the draining after an error.

In general if you have the debugger open set to pause on uncaught errors, do you want that to pause in the setTimeout in this library or in the original place the error was?

slorber commented 9 years ago

@calvinmetcalf I agree that with my implementation the "message" will never be printed. But this is actually because my implementation is a simpler rewrite of current behavior which is not fine for me :)

I could rewrite my pull request like that:

process.nextTick = function(task) {
    if ( !queue ) {
        queue = [task];
    } else {
        queue.push(task);
    }
    if ( !nextTickTimer ) {
        nextTickTimer = setTimeout(function processNextTick() {
            nextTickTimer = undefined;
            for (var i = 0, len = queue.length; i < len; i++) {
                try {
                    queue[i]();
                } catch(e) {
                      setTimeout(function() {
                         throw e;
                      },0);
                }
            }
        },0);
    }
};

which will print "message" and is much simpler than my initial PR imho.

I understand your point of loosing the original place of the error when using setTimeout like that... I think this happens when you rethrow an error right?

I think your solution could still be simplified, have you tried using a try/finally without catching the error? like

var processCompleted = false;
try {
   // tie up a resource
   processQueue()
   processCompleted = true
}
finally {
   if ( !processCompleted ) {
      reschedule() ....
   }
}
calvinmetcalf commented 9 years ago

so based on some perfs the try block isn't as bad as I expected it to be, interesting

calvinmetcalf commented 9 years ago

though a try/finally does cause some issues in firefox jsperf.com/asyncstuff/9/edit

edit: could be the labeled statment

slorber commented 9 years ago

yes @calvinmetcalf and you could probably make it faster by moving the try outside of the loop, and maintaining the last processed task index to reschedule tasks that were never called.

Something like that:

var queue = undefined;
var nextTickTimer = undefined;

function processNextTick() {
    nextTickTimer = undefined;
    var i = 0;
    try {
        for (var len = queue.length; i < len; i++) {
            queue[i]();
        }
    } finally {
        // If there is an error and there's at least one unprocessed task we reschedule it in next tick
        if (i+1 <= queue.length) {
            queue = queue.slice(i+1);
            nextTickTimer = setTimeout(processNextTick, 0);
        }
    }
}

process.nextTick = function (task) {
    if (!queue) {
        queue = [task];
    } else {
        queue.push(task);
    }
    if (!nextTickTimer) {
        nextTickTimer = setTimeout(processNextTick, 0);
    }
};
calvinmetcalf commented 9 years ago
var queue = [];
var draining = false;

function drainQueue() {
    if (draining) {
        return;
    }
    draining = true;
    var currentQueue;
    var len = queue.length;
    while(len) {
        currentQueue = queue;
        queue = [];
        var i = -1;
        try {
          while (++i < len) {
              currentQueue[i]();
          }
        } finally {
          if (i++ < len) {
            queue = currentQueue.slice(i).concat(queue);
            draining = false;
            return drainQueue();
          }
        }

        len = queue.length;
    }
    draining = false;
}
process.nextTick = function (fun) {
    queue.push(fun);
    if (!draining) {
        setTimeout(drainQueue, 0);
    }
};

is also really slow in firefox, I think firefox doesn't like finally blocks

slorber commented 9 years ago

@calvinmetcalf I still can't understand the complexity of your code requiring 2 variables for queue and 2 loops. Also calling drainQueue(); in a finally block seems not appropriate at all for me. finally blocks are intended for handling cleanup and should rather not trigger extra calls that could lead so a recursion

calvinmetcalf commented 9 years ago

The 2 queues are because you need to deal with process.nextTick being called recursively which in this code causes it to be put into a new array, not the one we are iterating. You could put them into the current array and that works fine until you get a situation where nested nextTicks causes the length of the array to be too long to fit into a 32 bit integer at which point a massive slowdown happens in v8 (chrome and node) due to the type signature of the function changing so the optimized code that was generated for the function is throw out (the function is almost definitely going to be optimized at this point as we are like 64k iterations in) this just tanks performance. I discovered this while perf testing a promise library and after a certain threshold of parallel things going on performance plummeted and it eventually ran out of memory. So the 2 queues are to prevent any one queue from getting too long, it also means we don't have this massive array hanging around the whole time and functions can be garbage collected much sooner. A comment in there would likely be helpful.

With regards to the finally block, that is what they are used for and that is a best practice but in the end is just a block of code that always runs after the try and catch clauses. I don't like the recursive function call either and originally tried to do it with a labeled jump statement but it seems that having the finally in there caused firefox to be about half as fast.

On Wed, Mar 25, 2015, 6:08 AM Sébastien Lorber notifications@github.com wrote:

@calvinmetcalf https://github.com/calvinmetcalf I still can't understand the complexity of your code requiring 2 variables for queue and 2 loops. Also calling drainQueue(); in a finally block seems not appropriate at all for me. finally blocks are intended for handling cleanup and should rather not trigger extra calls that could lead so a recursion

— Reply to this email directly or view it on GitHub https://github.com/defunctzombie/node-process/pull/37#issuecomment-85968248 .

calvinmetcalf commented 9 years ago

try 1 was


function drainQueue() {
    if (draining) {
        return;
    }
    draining = true;
    var currentQueue, i;
    var len = queue.length;
    var done;
    while(len) {
        currentQueue = queue;
        queue = [];
        i = -1;
        done = false;
        loopit: while (++i < len) {
             try {
                currentQueue[i]();
                done = true;
            } finally {
                if (!done) {
                    continue loopit;
                }
            }
         }
        len = queue.length;
    }
    draining = false;
}
defunctzombie commented 9 years ago

Has this PR been superseded by #38 ?

calvinmetcalf commented 9 years ago

Yes

On Wed, Apr 8, 2015, 3:04 AM Roman Shtylman notifications@github.com wrote:

Has this PR been superseded by #38 https://github.com/defunctzombie/node-process/pull/38 ?

— Reply to this email directly or view it on GitHub https://github.com/defunctzombie/node-process/pull/37#issuecomment-90824941 .