GoogleChromeLabs / comlink

Comlink makes WebWorkers enjoyable.
Apache License 2.0
11.37k stars 390 forks source link

Callback proxies called at different times in Firefox depending on whether they're passed directly or set as a property #538

Open alexmojaki opened 3 years ago

alexmojaki commented 3 years ago

index.html:

<!DOCTYPE html>

<script type="module">
  import * as Comlink from "https://unpkg.com/comlink@alpha/dist/esm/comlink.mjs";

  function callback(value) {
    console.log(`Result : ${value}`);
  }

  async function init() {
    const {MyClass} = Comlink.wrap(new Worker("worker.js"));
    const c = await new MyClass(Comlink.proxy(callback));
    c.run(Comlink.proxy(callback))
  }

  init();
</script>

worker.js:

importScripts("https://unpkg.com/comlink@alpha/dist/umd/comlink.js");

class MyClass {
  constructor (cb) {
    this.callbackProperty = cb;
  }
  run(directCallback) {
    this.callbackProperty("callbackProperty");
    directCallback("directCallback");
    console.log("Called callbacks!")
    while (true) {
      new Date()
    }
  }
}

Comlink.expose({MyClass});

In Chrome (and I think most other browsers) this logs everything as expected:

Called callbacks!
Result : callbackProperty
Result : directCallback

In Firefox, as long as the blocking while (true) { code is present, the Result : directCallback is missing. My interpretation of this is that directCallback("directCallback"); queues a message to be posted back to the main thread, but that message is only to be posted after run() completes, which may never happen.

What's surprising is that Result : callbackProperty still gets logged, so the message produced by this.callbackProperty("callbackProperty"); still gets posted either way. I can't see why at a glance - this.callbackProperty and directCallback are both proxies and I expect them to behave the same.

While I'm not using a silly looking while (true) in my real use case, it is still necessary to run synchronous code in my worker function after calling the callback. I can't use async, await, promises, or other callbacks. If you're really curious, the problem being solved is here: https://github.com/pyodide/pyodide/issues/1219 . In this case the worker function actually can't finish if the callback is never received by the main thread, but I imagine for other people this could manifest as the main thread not receiving progress updates to show the user while the worker does some long computation.

In any case, directCallback seems to work fine when testing in Chrome, and it was very hard to figure out both what was going wrong in Firefox and how to 'fix' it, so I think this difference between the two types of callback is a significant bug.

Beyond that though, I'm worried that the callbackProperty method only 'happens' to work in Firefox, and that I'm relying on some undefined behaviour that may change in future versions of comlink or browsers (including Firefox, Chrome, and others). Should I assume that as long as a worker function is running, messages sent in that function may not be received by the main thread?

surma commented 3 years ago

Huh interesting!

So there this is a known bug that we also ran into in the context of bringing Wasm threads to Squoosh. Basically, in Firefox it seems calling postMessage() doesn‘t immediately queue a task in the receiving realm, but instead queues a task in the current realm, which in turn queues a task in the receiving realm. That means the current JavaScript stack has to finish executing before the message actually gets sent.

@RReverser Am I remembering correctly that you filed a bug for this? Do you have the link to hand?

The fact that this only happens to one of the callbacks is intriguing. Both of these proxies should be backed by their own MessagChannel... One of them is around longer and maybe that makes a difference? Like some sort of initialization?

I do think this is Firefox specific and imo both messages should be received even with the while(true) present.

surma commented 3 years ago

(Side note: comlink@alpha is violently outdated, almost 2 years old. Can you see what changes when you updated to @latest?)

alexmojaki commented 3 years ago

Didn't notice the version thing, I'd copied from an example linked from the README, hence #539. @latest does the same thing.

If I remove the callbackProperty it doesn't fix anything, the directCallback still doesn't work.

RReverser commented 3 years ago

@RReverser Am I remembering correctly that you filed a bug for this? Do you have the link to hand?

No, that was actually for a Chrome bug - I think FF was the one that worked correctly in this regard.

lewisxy commented 2 years ago

Any update on this issue? I just came across this exact behavior in an application I am working on.

In my use case, there is a compute task running in the worker thread, and I want to add a callback to update the progress to UI thread. So it looks like this

// worker.js
class MyClass {
  // ...
  runCompute(data, progressCb) {
    while (...) {
      // ...
      progressCb(progress);
      // ...
    }
    return result;
  }
}
Comlink.expose(MyClass)

It works fine in Chrome. But in Firefox, progressCb doesn't get called until runCompute has finished, which ruins the whole purpose of reporting progress. The actual code for MyClass are existing code that I retrofitted into worker so it's not easy to change the API (i.e., use async). Any solution to this issue?

I am using Comlink 4.3.1 and Firefox 98.0.1

alexmojaki commented 2 years ago

Wow, you fit my prediction perfectly! "I imagine for other people this could manifest as the main thread not receiving progress updates to show the user while the worker does some long computation."

I ran into this again recently while working on my new library https://github.com/alexmojaki/comsync which generalizes the original use case I had, and I was planning on posting an update here.

This comment from @surma hit the nail on the head:

One of them is around longer and maybe that makes a difference? Like some sort of initialization?

Without that, I don't think I would have figured this out.

Basically, there needs to be some time between the worker function (runCompute) receiving the callback and it being called during which Firefox can set up the message port or whatever. That means time away from the runCompute 'thread', which typically means an await during which runCompute pauses to let other things happen. It can't just be any amount of time - I found that setTimeout for 1ms was about 50% successful.

The solution for me was to call the callback with dummy initialisation arguments at the very start of the worker function and wait for a response from that to guarantee that the callback was ready. You should be able to do the same thing in your case:

// worker.js
class MyClass {
  // ...
  async runCompute(data, progressCb) {
    await progressCb(0);
    while (...) {
      // ...
      progressCb(progress);  // no await needed here
      // ...
    }
    return result;
  }
}
Comlink.expose(MyClass)

So you need to make the 'outside' of your API async, which shouldn't be an issue since your main thread has to talk to Comlink asynchronously anyway, while the inner worker code can remain sync.

I think Comlink should be able to add a general fix for this, although I wonder if doing so unconditionally could be controversial for performance.

lewisxy commented 2 years ago

So you need to make the 'outside' of your API async, which shouldn't be an issue since your main thread has to talk to Comlink asynchronously anyway, while the inner worker code can remain sync.

Thanks, this fixed the issue at least for my current use case.

However, it does make the code a bit messier, as the real runCompute is a bit more complicated than the example I provided.

TL;DR: below is the actual detail and the fix, feel free to read and comment/criticize on the design.


MyClass actually provides 2 interfaces

runCompute(data, progressCb) { /* do work */ return result; }
// as a hindsight, this is not be the best interface for an async function, it should just use async keyword instead
runComputeAsync(data, resultCb, progressCb) { /* do work */ resultCb(result); }

But because the class was designed without worker in mind, runComputeAsync is basically a wrapper of runCompute.

In order to reuse code as well as supporting browsers without Worker support, I created a subclass of it (initialized in the main thread). This subclass will call the Comlink remote with delegation and call local with inheritance (in case Worker is not supported).

class MyClass2 extends MyClass {
  constructor(...) {
    super(...)
    this.worker = new Worker(...); // worker reuse MyClass thanks to webpack 5
    const RemoteMyClass = Comlink.wrap(this.worker); // omitted some error handling and checks against undefined
    new RemoteMyClass(...).then(r => { this.remote = r });
    // ...
  }

  runComputeAsync(data, resultCb, progressCb) {
    if (worker !== undefined) {
      this.remote.runCompute(data, progressCb).then(resultCb);
    } else {
      super.runComputeAsync(data, resultCb, progressCb);
    }
  }
}

So the fix was to add a new method in MyClass, and calls this new method in MyClass2 instead

// MyClass
async runComputeAsync2(data, progressCb) {
  await progressCb(0);
  return this.runCompute(data, progressCb);
}
// MyClass2
if (worker !== undefined) {
  this.remote.runComputeAsync2(data, progressCb).then(resultCb);
}