Open alexmojaki opened 3 years ago
Huh interesting!
So there this is a known bug that we also ran into in the context of bringing Wasm threads to Squoosh. Basically, in Firefox it seems calling postMessage()
doesn‘t immediately queue a task in the receiving realm, but instead queues a task in the current realm, which in turn queues a task in the receiving realm. That means the current JavaScript stack has to finish executing before the message actually gets sent.
@RReverser Am I remembering correctly that you filed a bug for this? Do you have the link to hand?
The fact that this only happens to one of the callbacks is intriguing. Both of these proxies should be backed by their own MessagChannel
... One of them is around longer and maybe that makes a difference? Like some sort of initialization?
I do think this is Firefox specific and imo both messages should be received even with the while(true)
present.
(Side note: comlink@alpha
is violently outdated, almost 2 years old. Can you see what changes when you updated to @latest
?)
Didn't notice the version thing, I'd copied from an example linked from the README, hence #539. @latest
does the same thing.
If I remove the callbackProperty
it doesn't fix anything, the directCallback
still doesn't work.
@RReverser Am I remembering correctly that you filed a bug for this? Do you have the link to hand?
No, that was actually for a Chrome bug - I think FF was the one that worked correctly in this regard.
Any update on this issue? I just came across this exact behavior in an application I am working on.
In my use case, there is a compute task running in the worker thread, and I want to add a callback to update the progress to UI thread. So it looks like this
// worker.js
class MyClass {
// ...
runCompute(data, progressCb) {
while (...) {
// ...
progressCb(progress);
// ...
}
return result;
}
}
Comlink.expose(MyClass)
It works fine in Chrome. But in Firefox, progressCb
doesn't get called until runCompute
has finished, which ruins the whole purpose of reporting progress. The actual code for MyClass
are existing code that I retrofitted into worker so it's not easy to change the API (i.e., use async
). Any solution to this issue?
I am using Comlink
4.3.1 and Firefox 98.0.1
Wow, you fit my prediction perfectly! "I imagine for other people this could manifest as the main thread not receiving progress updates to show the user while the worker does some long computation."
I ran into this again recently while working on my new library https://github.com/alexmojaki/comsync which generalizes the original use case I had, and I was planning on posting an update here.
This comment from @surma hit the nail on the head:
One of them is around longer and maybe that makes a difference? Like some sort of initialization?
Without that, I don't think I would have figured this out.
Basically, there needs to be some time between the worker function (runCompute) receiving the callback and it being called during which Firefox can set up the message port or whatever. That means time away from the runCompute 'thread', which typically means an await
during which runCompute pauses to let other things happen. It can't just be any amount of time - I found that setTimeout
for 1ms was about 50% successful.
The solution for me was to call the callback with dummy initialisation arguments at the very start of the worker function and wait for a response from that to guarantee that the callback was ready. You should be able to do the same thing in your case:
// worker.js
class MyClass {
// ...
async runCompute(data, progressCb) {
await progressCb(0);
while (...) {
// ...
progressCb(progress); // no await needed here
// ...
}
return result;
}
}
Comlink.expose(MyClass)
So you need to make the 'outside' of your API async, which shouldn't be an issue since your main thread has to talk to Comlink asynchronously anyway, while the inner worker code can remain sync.
I think Comlink should be able to add a general fix for this, although I wonder if doing so unconditionally could be controversial for performance.
So you need to make the 'outside' of your API async, which shouldn't be an issue since your main thread has to talk to Comlink asynchronously anyway, while the inner worker code can remain sync.
Thanks, this fixed the issue at least for my current use case.
However, it does make the code a bit messier, as the real runCompute
is a bit more complicated than the example I provided.
TL;DR: below is the actual detail and the fix, feel free to read and comment/criticize on the design.
MyClass actually provides 2 interfaces
runCompute(data, progressCb) { /* do work */ return result; }
// as a hindsight, this is not be the best interface for an async function, it should just use async keyword instead
runComputeAsync(data, resultCb, progressCb) { /* do work */ resultCb(result); }
But because the class was designed without worker in mind, runComputeAsync
is basically a wrapper of runCompute
.
In order to reuse code as well as supporting browsers without Worker support, I created a subclass of it (initialized in the main thread). This subclass will call the Comlink remote with delegation and call local with inheritance (in case Worker is not supported).
class MyClass2 extends MyClass {
constructor(...) {
super(...)
this.worker = new Worker(...); // worker reuse MyClass thanks to webpack 5
const RemoteMyClass = Comlink.wrap(this.worker); // omitted some error handling and checks against undefined
new RemoteMyClass(...).then(r => { this.remote = r });
// ...
}
runComputeAsync(data, resultCb, progressCb) {
if (worker !== undefined) {
this.remote.runCompute(data, progressCb).then(resultCb);
} else {
super.runComputeAsync(data, resultCb, progressCb);
}
}
}
So the fix was to add a new method in MyClass, and calls this new method in MyClass2 instead
// MyClass
async runComputeAsync2(data, progressCb) {
await progressCb(0);
return this.runCompute(data, progressCb);
}
// MyClass2
if (worker !== undefined) {
this.remote.runComputeAsync2(data, progressCb).then(resultCb);
}
index.html
:worker.js
:In Chrome (and I think most other browsers) this logs everything as expected:
In Firefox, as long as the blocking
while (true) {
code is present, theResult : directCallback
is missing. My interpretation of this is thatdirectCallback("directCallback");
queues a message to be posted back to the main thread, but that message is only to be posted afterrun()
completes, which may never happen.What's surprising is that
Result : callbackProperty
still gets logged, so the message produced bythis.callbackProperty("callbackProperty");
still gets posted either way. I can't see why at a glance -this.callbackProperty
anddirectCallback
are both proxies and I expect them to behave the same.While I'm not using a silly looking
while (true)
in my real use case, it is still necessary to run synchronous code in my worker function after calling the callback. I can't use async, await, promises, or other callbacks. If you're really curious, the problem being solved is here: https://github.com/pyodide/pyodide/issues/1219 . In this case the worker function actually can't finish if the callback is never received by the main thread, but I imagine for other people this could manifest as the main thread not receiving progress updates to show the user while the worker does some long computation.In any case,
directCallback
seems to work fine when testing in Chrome, and it was very hard to figure out both what was going wrong in Firefox and how to 'fix' it, so I think this difference between the two types of callback is a significant bug.Beyond that though, I'm worried that the
callbackProperty
method only 'happens' to work in Firefox, and that I'm relying on some undefined behaviour that may change in future versions of comlink or browsers (including Firefox, Chrome, and others). Should I assume that as long as a worker function is running, messages sent in that function may not be received by the main thread?