emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.65k stars 3.29k forks source link

[Multithreading] PROXY_TO_PTHREAD & MAIN_THREAD_EM_ASM causes perf degradation #22570

Open ravisumit33 opened 1 week ago

ravisumit33 commented 1 week ago

Please include the following in your bug report:

Version of emscripten/emsdk:

emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.56 (cf90417346b78455089e64eb909d71d091ecc055)
clang version 19.0.0git (https:/github.com/llvm/llvm-project 34ba90745fa55777436a2429a51a3799c83c6d4c)
Target: wasm32-unknown-emscripten
Thread model: posix
InstalledDir: ~/emsdk/upstream/bin

I am trying to port my application from single-threaded to multi-threaded environment. I cannot ensure max number of threads required at a time by my application, thus I finalized using PROXY_TO_PTHREAD. In single-threaded mode, my application used to work like below:

  1. C++ main function does some initialization. After main function exits, we keep the runtime alive.
  2. We have exposed a C++ function to process events coming from the UI.

To port this architecture into multi-threaded environment I used PROXY_TO_PTHREAD to create a proxied main thread and kept that thread alive for further processing. I used proxying to proxy events coming from UI to this detached thread. Once done, this thread called MAIN_THREAD_EM_ASM to send the response back to the main application thread. Also, this is the only MAIN_THREAD_EM_ASM that the detached thread does. Rest is C++ execution without waiting on anything else.

Functionality wise, this model worked well. But when doing performance analysis I figured out that I had a degradation of around 200-400 ms. Upon profiling, I could see that detached thread completed work in time but was waiting for around 200-400 ms for the MAIN_THREAD_EM_ASM to complete i.e. for main application thread to receive the response. Also, the main application thread was completey idle around this time. This can be seen in the below screenshot.

Untitled design (1)

Is this performance degradation expected? Is there any other way I could model my app to get away with this? How can I minimise the time taken by the detached thread to send back the response?

ravisumit33 commented 3 days ago

@sbc100 @kripken Any thought on this?

sbc100 commented 3 days ago

To be clear this is not some kind of regression? i.e. you are not claiming that some previous version of emscripten had a faster version of MAIN_THREAD_EM_ASM?

As far as I know there are no delays built into the proxying system. The call to MAIN_THREAD_EM_ASM should use a postMessage to wake the main which should then use a shared memory futex to wake the secondary thread once its done.

@tlively are you aware of any reason for such a delay?

@ravisumit33 perhaps you could share a example of simple program that demonstrates the delay you are talking about?

sbc100 commented 3 days ago

Are you doing anything on the main UI thread that is likely to be blocking it? i.e. are you doing synchronous proxying to your background thread? i.e. can you give more details on what you mean by "I used proxying to proxy events coming from UI to this detached thread"?

ravisumit33 commented 3 days ago

Sorry to not provide complete details about the issue. I am doing an async proxy to the detached thread. My main application thread isn't the main UI thread. I instantiate wasm in a web-worker.

ravisumit33 commented 3 days ago

To be clear this is not some kind of regression? i.e. you are not claiming that some previous version of emscripten had a faster version of MAIN_THREAD_EM_ASM?

As far as I know there are no delays built into the proxying system. The call to MAIN_THREAD_EM_ASM should use a postMessage to wake the main which should then use a shared memory futex to wake the secondary thread once its done.

@tlively are you aware of any reason for such a delay?

@ravisumit33 perhaps you could share a example of simple program that demonstrates the delay you are talking about?

I will try to reproduce in a simple program. Just to be clear, delay isn't in proxying from main application thread to the detached thread. Delay comes in receiving the response from the background (detached) thread which is sending the response back in a synchronous way (MAIN_THREAD_EM_ASM).

sbc100 commented 3 days ago

So you have the following JS contexts:

0: The main browser UI thread 1: The worker that starts your wasm program

  1. The worker that runs the main function inside a pthread (due to PROXY_TO_PTHREAD).

Is that correct?

sbc100 commented 3 days ago

I instantiate wasm in a web-worker

I think think this aspect could be a clue, since its not the most common setup. Can you explain a little more about this setup? I assume you create this worker using the normal new Worker API and communicate with it solely through postMessage to/from the main UI browser thread? (i.e. the main UI browser thread doesn't do any shared memory stuff?)

ravisumit33 commented 3 days ago

Yes list of JS contexts is correct. I create the worker instantiating wasm using new Worker API as you mentioned and communicate with it through postMessage from the main UI browser thread. The main UI browser thread doesn't do any shared memory stuff.

ravisumit33 commented 3 days ago

I have highlighted the delay in red rectangle below. As can be seen background thread (below one) is just wating till the main application thread (above one) has received the response. Also, main application thread is idle during the delay. Untitled design (2)

tlively commented 2 days ago

Instead of using MAIN_THREAD_EM_ASM to communicate the results back, can you use emscripten_proxy_callback, emscripten_proxy_callback_with_ctx, emscripten_proxy_promise, or emscripten_proxy_promise_with_ctx? I don't know where the pause could be coming from, but these would be more direct methods of reporting the results.

An example program that demonstrates the issue would certainly be helpful.

sbc100 commented 2 days ago

I think we should try to get to the bottom of this since MAIN_THREAD_EM_ASM shouldn't have this kind of delay. I agree a simple repro case would be great here.

sbc100 commented 2 days ago

By the way I see that you have .worker.js in your filename. Does that mean you are using a version of emscripten before #21701 landed (this change removed the worker.js output file)? i.e. older than 3.1.58?

edit: I see you are using 3.1.56, would upgrading to the latest version be difficult?