denoland / deno

A modern runtime for JavaScript and TypeScript.
https://deno.com
MIT License
97.68k stars 5.38k forks source link

Microtask queue is not handled correctly in Deno #11731

Open lucacasonato opened 3 years ago

lucacasonato commented 3 years ago

In browsers, a microtask checkpoint is performed whenever the JS stack empties. Take the following example:

const filereader = new FileReader();

filereader.onloadstart = () => {
  console.log("loadstart event");
  queueMicrotask(() => console.log("loadstart microtask"));
};
filereader.onloadend = () => {
  console.log("loadend event");
  queueMicrotask(() => console.log("loadend microtask"));
};
filereader.onload = () => {
  console.log("load event");
  queueMicrotask(() => console.log("load microtask"));
};
filereader.onprogress = () => {
  console.log("progress event");
  queueMicrotask(() => console.log("progress microtask"));
};

filereader.readAsText(new Blob(["hello"]));
queueMicrotask(() => console.log("after start microtask"));

In all current browsers, FileReader is implemented as a native object. That means when you call filereader.readAsText, the following happens:

  1. When calling filereader.readAsText, JS calls into native C++ code and queues some tasks (things that should happen on future spins of the event loop) to emit the loadstart, progress, load, and loadend events in that order. The C++ code returns back into JS.
  2. The queueMicrotask call pushes one microtask onto the microtask queue.
  3. At this point there is no more JS to be executed, and the JS stack is empty. Thus a microtask checkpoint is performed. after start microtask is printed to the console. The microtask queue is now empty.
  4. In C++ code, the event loop takes the first ready task from the task queue. The task is "dispatching the loadstart event". The event is dispatched. This calls from C++ into JS. Something is now on the JS stack. loadstart event is logged, and a microtask is queued.
  5. There is no more JS to be executed, and the JS stack is empty. Thus a microtask checkpoint is performed. loadstart microtask is printed to the console. The microtask queue is now empty.
  6. The same happens for the progress, load, and loadend events.

This is how the browser console looks after executing this:

after start microtask
loadstart event
loadstart microtask
progress event
progress microtask
load event
load microtask
loadend event
loadend microtask

Now lets see what happens in Deno. We currently also follow the rule that a microtask checkpoint is performed whenever the JS stack empties. In Deno FileReader is implemented in JS code:

  1. When calling filereader.readAsText, JS calls into JS code and queues some tasks emit the loadstart, progress, load, and loadend events in that order. The JS code returns back to the calling JS code.
  2. The queueMicrotask call pushes one microtask onto the microtask queue.
  3. At this point there is no more JS to be executed, and the JS stack is empty. Thus a microtask checkpoint is performed. after start microtask is printed to the console. The microtask queue is now empty.
  4. In JS the first ready task is taken from the task queue. Something is already on the JS stack. The task is "dispatching the loadstart event". The event is dispatched. This calls JS from JS. loadstart event is logged, and a microtask is queued. At this point there is still something on the JS stack, so a microtask checkpoint is not performed.
  5. The same happens for the progress, load, and loadend events.
  6. Now there is no more JS to be executed, and the JS stack is empty. Thus a microtask checkpoint is performed. loadstart microtask, progress microtask, load microtask, and loadend microtask are printed to the console. The microtask queue is now empty.

The console looks like this:

after start microtask
loadstart event
progress event
loadstart microtask
progress microtask
load event
loadend event
load microtask
loadend microtask

Note: if you run this in the latest build from main, it will actually print something different. This is due to a bug which I have already fixed on a seperate branch. Consider the above output to be canonical.

The problem is that for purposes of V8s automatic microtask checkpointing, which happens when the JS queue empties, there is no distinction between JS code that is part of Deno's internal implementation, and user code. For microtask checkpointing to work correctly, this needs to be fixed.

The naive approach to fixing this is to just force V8 to run a microtask checkpoint whenever user code that was called by "internal JS" returns execution back to that internal JS. For example, we could microtask checkpoint after every event handler insider a dispatchEvent is called.

This has a problem though, that I can demonstrate with the below example:

const target = new EventTarget();

target.addEventListener("foo", () => {
  console.log("foo event");
  queueMicrotask(() => console.log("foo microtask"));
});
target.addEventListener("bar", () => {
  console.log("bar event");
  queueMicrotask(() => console.log("bar microtask"));
});

target.dispatchEvent(new Event("foo"));
target.dispatchEvent(new Event("bar"));

With our naive approach, we would dispatch event foo, call the event handler for foo, which will print "foo event", and then perform a microtask checkpoint which causes "foo microtask" to be printed. The same would happen for the "bar" event:

foo event
foo microtask
bar event
bar microtask

If you look at what this does in the browser however, you will see the following printed:

foo event
bar event
foo microtask
bar microtask

Here you can see that our naive approach broke down. When a user calls dispatchEvent there is still some JS on the stack. The internal JS code directly calls the event handler. The internal code is sandwiched between two pieces of user code. In this case we do not want to perform a microtask checkpoint when the user JS code returns to the internal JS code, because there is still something on the JS stack (the dispatchEvent call). Because of this the microtask checkpoint is deferred until after both event handlers have been called, and there is no more JS to be executed.

So how do we solve this? Instead of naively performing a microtask checkpoint every time some user JS code returns to the internal JS code, we instead perform that checkpoint only if there is no user js code on the stack. If the internal JS code is sandwiched between user JS code, no microtask checkpoint would be performed. The difficulty is figuring out if we are sandwiched or not in a performant way.

lucacasonato commented 3 years ago

I have discussed with @bnoordhuis, @piscisaureus, and @bartlomieju and we did not find any reasonable solution. V8 does not provide a way to mark code as "internal" for the purposes of automatic microtask checkpoints, and manual microtask checkpoints are not possible because the naive approach does not work as described above, and a more complex approach where we keep track of if there are user code frames on the stack is going to be too slow or too difficult to implement.

MattiasBuelens commented 3 years ago

The naive approach to fixing this is to just force V8 to run a microtask checkpoint whenever user code that was called by "internal JS" returns execution back to that internal JS. For example, we could microtask checkpoint after every event handler insider a dispatchEvent is called.

That's not correct. Nowhere does the spec say to "run microtasks" as part of dispatchEvent().

Instead, the FileReader spec says to "queue a task to fire a progress event named loadstart". The important bit here is "queue a task": this task is put on a queue, and an event loop continually runs through all tasks. After executing each task, the event loop must perform a microtask checkpoint, in other words: run all microtasks.

It's not so much about distinguishing between "internal JS" and "user-land JS". User-land code can call into internal code, which calls into more user-land code, etc. This has nothing to do with microtasks.

It's really about making sure that each task runs on its own, and that you always run microtasks after each task. It shouldn't matter how many other tasks are still queued after running the "dispatch a loadstart event" task.

So how do we solve this? Instead of naively performing a microtask checkpoint every time some user JS code returns to the internal JS code, we instead perform that checkpoint only if there is no user js code on the stack. If the internal JS code is sandwiched between user JS code, no microtask checkpoint would be performed. The difficulty is figuring out if we are sandwiched or not in a performant way.

While that might work, it's really complicated and it's not even aligned with the specification. The spec doesn't even mention "the JS stack", it just says: "perform a microtask checkpoint after running every queued task". So ideally, Deno needs to support the concepts of a "task", a "task queue" and an "event loop".

In pseudo-code, this would look something like this:

let taskQueue = [];
let microtaskQueue = [];
let runningMicrotasks = false;

function queueTask(callback) {
  taskQueue.push(callback);
}

function queueMicrotask(callback) {
  microtaskQueue.push(callback);
}

function runEventLoop() {
  while (taskQueue.length > 0) {
    const task = taskQueue.shift();
    task();
    runMicrotasks();
  }
}

function runMicrotasks() {
  if (runningMicrotasks) return;
  runningMicrotasks = true;
  while (microtaskQueue.length > 0) {
    const microtask = microtaskQueue.shift();
    microtask();
  }
  runningMicrotasks = false;
}

(In practice, tasks would be handled by run_event_loop in Rust and microtasks would be handled by scope.perform_microtask_checkpoint. No need to re-invent the wheel. 😛)

andreubotella commented 3 years ago

The spec doesn't even mention "the JS stack", it just says: "perform a microtask checkpoint after running every queued task".

Not in the event loop processing section, but the relevant algorithm is "clean up after running script". In the FileReader case, the load and loadend events are fired from the same event loop task (10.5 in "read operation"), so the only reason microtasks are being run between those events in browsers is because the JS stack has emptied. I don't understand the case with streams as much, but since the streams spec doesn't queue a task in the event loop at any point, using only microtasks, it seems like it's much the same.

That said, a lot of Deno's internal code use promises and async/await, with the result that a lot of what the specs say should be tasks are instead microtasks. I put together a prototype of a proper implementation of the event loop based on macrotasks, and that did fix the ordering of event and microtasks in the loadstart and progress events – because those are fired on two different tasks in the spec. But it didn't do anything for the load and loadend events, or for streams.

MattiasBuelens commented 3 years ago

It doesn't look like "clean up after running script" applies here, since we're not evaluating new JavaScript code here. Instead, "clean up after running a callback" applies, but that doesn't perform a microtask checkpoint.

But you're right, it's odd that FileReader fires two events in a single task. I assume that would mean that microtasks will only run after the loadend event, but that's not what browsers do. Quite confusing... 😅

Indeed, the main problem appears to be Deno using microtasks in places where it should be using regular tasks. I hope some progress can be made on this front. 🙏

andreubotella commented 3 years ago

It doesn't look like "clean up after running script" applies here, since we're not evaluating new JavaScript code here. Instead, "clean up after running a callback" applies, but that doesn't perform a microtask checkpoint.

But you're right, it's odd that FileReader fires two events in a single task. I assume that would mean that microtasks will only run after the loadend event, but that's not what browsers do. Quite confusing... sweat_smile

Indeed, the main problem appears to be Deno using microtasks in places where it should be using regular tasks. I hope some progress can be made on this front. pray

Firing an event ultimately ("inner invoke", step 10) calls Web IDL's "call a user object operation", which prepares and then cleans up both script and callback. I don't really understand why this has to happen, but it's not a browser bug.

MattiasBuelens commented 3 years ago

Ah, I see. Thanks for clarifying. 🙂

So according to Web IDL: if the JavaScript execution stack is empty after running a user's callback, then it should immediately run microtasks. That also explains why adding two event listeners for the same event type runs microtasks after every event listener, even though they're all running within the same "dispatch an event" task.

That does make things more difficult for Deno, since it's using JavaScript for its internal logic too. I suppose Deno could implement prepare to run script and clean up after running script by incrementing and decrementing a global counter, to keep track of the number of "user code stack frames" we're currently in. Of course, that means updating all of the places where Deno calls into user code... which I assume is a ton of work. 😬 But it's better than stack inspection, right? 😅