emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.67k stars 3.29k forks source link

Fast Javascript WebAssembly Communication #15004

Open monoto opened 3 years ago

monoto commented 3 years ago

I created a webassembly module for accelerating vector math using SIMD and multi-threading. Multiplying 8 millions vector4 and matrix4 reveals that most time was spent on (1) JS calling C++ embind-exported function and (2) pThread communicates with main thread using MAIN_THREAD_ASYNC_EM_ASM.

To give you a idea how much time, here is a result of a sample run:

Pure Javascript 304ms, SIMD 180ms, SIMD with 4 threads (measured in c++) 73ms SIMD with 8 threads (measured in c++) 50ms SIMD with 4 threads (measured in Javascript end-to-end) 323ms

A significant amount of time (250ms) is spent on (1) and (2). Is there current effort to improve performance of cross-boundary and inter-thread communication?

I know SharedArrayBuffer is almost instantaneous, but we need a faster way for signaling besides polling.

Thank you.

juj commented 3 years ago

Is the code sample small enough per chance to post/examine it in a github issue?

Neither embind and EM_ASM are intended to be used for high performance interop - frequent jumping between Wasm<->JS will nuke performance rather badly. For fastest inter-thread communication, using pthread mutexes/task queues and other synchronization primitives will be fastest.

monoto commented 3 years ago

Polling on an Atomics from the main thread's requestAnimationFrame could cut down on the delay. Let me try that and report back.

monoto commented 3 years ago

My test case turned out to be ill-conceived. I launched threaded test and immediately run the Javascript test. So effectively, they are running in parallel taking resources away from each others. Here is a better result, for 8 million vector and matrix multiplications:

Pure Javascript: 285 ms, SIMD: 180 ms, SIMD + 8 threads (measured in c++): 55 ms, SIMD + 8 threads (end-to-end in JS): 72 ms,

The overhead from (1) + (2) is around 15 ~16 ms consistently which is quite reasonable.

However, one thing worth noting, if main thread is busy, MAIN_THREAD_ASYNC_EM_ASM can get delayed significantly as indicated by my previous test result.

monoto commented 3 years ago

Unless Browser Vendors use system interrupt to give MAIN_THREAD_ASYNC_EM_ASM highest priority on the main thread. I don't see what else emscripten can do in the mean time?