emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.64k stars 3.29k forks source link

[Threads] Module overrides are not available in worker.js #22423

Open ravisumit33 opened 4 weeks ago

ravisumit33 commented 4 weeks ago

Please include the following in your bug report:

Version of emscripten/emsdk: emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.56 (cf90417346b78455089e64eb909d71d091ecc055) clang version 19.0.0git (https:/github.com/llvm/llvm-project 34ba90745fa55777436a2429a51a3799c83c6d4c) Target: wasm32-unknown-emscripten Thread model: posix InstalledDir: ~/emsdk/upstream/bin

I am currently trying to port my single threaded wasm to multi-threaded one. Our wasm is currently using SPLIT_MODULE integration and we have overriden Module.loadSplitModule function. We load our wasm like below:

Module.loadSplitModule = () => { 
// overriden 
};
importScripts("wasm.js");

This override was working flawlessly in single-threaded mode. But with multithreading, if any worker needs to load the deferred module, it is not going through our loadSplitModule override but with the default one causing our app to crash. This makes me curious about other overrides on the Module object that we have. Perhaps they are also not passing through in any worker that is being created. One way to solve this can be doing all the Module overrides in pre.js but that doesn't seem feasible for us as the overrides use other parts of the main app js as well. How to solve this issue in a generic way?

sbc100 commented 4 weeks ago

If you want to code to be available on all workers you probably need it to be part of wasm.js itself. Is it possible to override Module.loadSplitModule in a --pre-js file?

ravisumit33 commented 4 weeks ago

@sbc100 That isn't possible since the such overrides also depend upon the state in my main app js where wasm.js is imported. And that state is only available in main application thread. One example of such override is Module.locateFile. I believe if I want to use these overrides I need to proxy those operations from worker to the main application thread, right?

sbc100 commented 4 weeks ago

So the override you have can only run on the main thread (since it depends on main thread state)? That seems tricky.

I'm not sure how loadSplitModule works in the face of multi-threading, but presumably only one thread should actually performs the download and compile of the same module and then other threads should receive the module via postMesage. @tlively is this something that already works? My advice would be to always have the main thread load the split module and then postMessage it over all its pthread-workers. This is who it works in dyanmic linking today. That would also solve your problem since loadSplitModule would only ever actually execute on the main thread.

tlively commented 4 weeks ago

That would also solve your problem since loadSplitModule would only ever actually execute on the main thread.

That's true as long as none of the workers try to call a secondary function before they have received the module via postMessage. If it does, even proxying to the main module wouldn't work because there's no way for that to be synchronous.

In general, proxying the overrides to the main thread sounds like the right strategy, though.

ravisumit33 commented 4 weeks ago

Yes, right. It would be even better if we can somehow provide to emscripten which override to proxy to the main thread so that only those are proxied. @sbc100 Thoughts? Meanwhile, also let me know how to proxy any override to the main thread so that I try that with the loadSplitModule on my side.

sbc100 commented 4 weeks ago

I imagine this could be tricky because if you proxy the loadSplitModule call itself the main thread, then what you done is load the module on the main thread, and not on the calling thread where you actually want to load it.

I guess you would need some kind of logic to then postMessage the resulting module back to the calling thread.. but that would required returning to the event loop. @tlively have you already though through how this should work?

ravisumit33 commented 3 weeks ago

@tlively Please share your thoughts over above query.

tlively commented 3 weeks ago

As I wrote before, each worker must already have received the module via postMessage by the time loadSplitModule is called on that worker. That means that trying to proxy it to the main thread cannot work, since the postMessage result cannot be received synchronously. That all assumes that you're not using Asyncify/JSPI, which removes the requirement that the module be loaded synchronously. Besides using Asyncify or JSPI, I don't think it's possible to get loadSplitModule working by proxying.

ravisumit33 commented 3 weeks ago

Okay. May be we can't solve it in a generic way. Coming to my specific issue, I have my custom Module.rootUrl in main application thread which is set to a client-provided value at run time. It is the base url from where I fetch my wasm related assets. Clients provide this value because sometimes they use my app in a blob url and thus I can't use location.href. When secondary module is required in a thread, it calls my overriden Module.loadSplitModule which in-turn calls Module.locateFile in which I have used above mentioned Module.rootUrl. @sbc100 @tlively Can this issue be solved using a workaround then?

tlively commented 3 weeks ago

You’ll have to either use asyncify/JSPI or ensure that the secondary module is loaded on each thread before it becomes necessary. I don’t think there are any workarounds to get around that.

tlively commented 3 weeks ago

It’s worth noting that if you do use Asyncify/JSPI, then that will also allow your secondary threads to wait while they asynchronously receive the secondary module from the main thread via postMessage, so your secondary threads won’t necessarily need to know how to locate the module.

ravisumit33 commented 2 weeks ago

Since, currently I just need value of rootUrl to be present in the pthread worker, what I have done is hijacked the locateFile override like below:

Module.locateFile = (filename, prefix) => {
    // other logic
    if (filename === "wasm.worker.js"){
        const workerFileSrc = `
            importScripts("wasm.worker.js");
            Module.rootUrl = ${Module.rootUrl};
        `;
        const blob = new Blob([newFileSrc], { type: "application/javascript" });
        return URL.createObjectURL(blob);
    }
};

This has a drawback that pthread workers are always created through blob url, but I believe I can deal with this for now.