emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.69k stars 3.29k forks source link

JSPI asynchronous calls into WASM are much slower than Asyncify #21081

Open rhashimoto opened 8 months ago

rhashimoto commented 8 months ago

I think this is more likely to be a Chromium bug and I have filed a bug there, but I don't understand the details well enough to know for sure.

Asynchronous calls from JavaScript into WASM are 2+ orders of magnitude slower with JSPI over Asyncify. I'm using Chrome 122.0.6248.0 (Official Build) canary (arm64) on an M2 Mac with chrome://flags/#enable-experimental-webassembly-stack-switching enabled.

I have a test page here (source) that times 10,000 calls to a C function that calls an asynchronous JavaScript function (use Chrome Canary as some older versions just crash the page). It loops over the calls once from C and once from Javascript, for both Asyncify and JavaScript builds. The output of the page looks like this:

asyncify 10000 iterations in C 0.01 seconds
asyncify 10000 iterations in JS 0.01 seconds
jspi 10000 iterations in C 0.003 seconds
jspi 10000 iterations in JS 3.554 seconds

This shows that with Asyncify, calling the JS function from C and calling it from JavaScript to C to JavaScript, is the same 0.01 seconds. JSPI is faster than Asyncify at calling the JS function from C, but is 350 times slower at calling from JS.

Emscripten output **Version of emscripten/emsdk:** $ emcc -v emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.47 (431685f05c67f0424c11473cc16798b9587bb536) clang version 18.0.0 (https://github.com/llvm/llvm-project 21030b9ab4487d845e29792063f5666d8c4b8e09) Target: wasm32-unknown-emscripten Thread model: posix InstalledDir: /home/dev/emsdk/upstream/bin **Full link command and output with `-v` appended:** $ make (both Asyncify and JSPI builds shown here) emcc -O2 -I. -s ALLOW_MEMORY_GROWTH=1 -s ASYNCIFY_EXPORTS='["cfunc", "cloop"]' -s ASYNCIFY_IMPORTS='["jfunc"]' -s ENVIRONMENT="web,worker" -s EXPORTED_RUNTIME_METHODS='["ccall", "cwrap"]' -s INVOKE_RUN -s WASM=1 -v -s ASYNCIFY -o cfunc-asyncify.mjs cfunc.c --js-library libfoo.js "/home/dev/emsdk/upstream/bin/clang" --version "/home/dev/emsdk/upstream/bin/clang" -target wasm32-unknown-emscripten -fignore-exceptions -fvisibility=default -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr -DEMSCRIPTEN -Werror=implicit-function-declaration --sysroot=/var/tmp/emsdk/upstream/emscripten/cache/sysroot -Xclang -iwithsysroot/include/fakesdl -Xclang -iwithsysroot/include/compat -O2 -I. -v cfunc.c -c -o /tmp/emscripten_temp_8v67r6_8/cfunc_0.o clang version 18.0.0 (https://github.com/llvm/llvm-project 21030b9ab4487d845e29792063f5666d8c4b8e09) Target: wasm32-unknown-emscripten Thread model: posix InstalledDir: /home/dev/emsdk/upstream/bin (in-process) "/var/tmp/emsdk/upstream/bin/clang-18" -cc1 -triple wasm32-unknown-emscripten -emit-obj -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name cfunc.c -mrelocation-model static -mframe-pointer=none -ffp-contract=on -fno-rounding-math -mconstructor-aliases -target-cpu generic -debugger-tuning=gdb -fdebug-compilation-dir=/home/dev/browser-test-cases/jspi -v -fcoverage-compilation-dir=/home/dev/browser-test-cases/jspi -resource-dir /var/tmp/emsdk/upstream/lib/clang/18 -D EMSCRIPTEN -I . -isysroot /var/tmp/emsdk/upstream/emscripten/cache/sysroot -internal-isystem /var/tmp/emsdk/upstream/lib/clang/18/include -internal-isystem /var/tmp/emsdk/upstream/emscripten/cache/sysroot/include/wasm32-emscripten -internal-isystem /var/tmp/emsdk/upstream/emscripten/cache/sysroot/include -O2 -Werror=implicit-function-declaration -ferror-limit 19 -fvisibility=default -fgnuc-version=4.2.1 -fignore-exceptions -fcolor-diagnostics -vectorize-loops -vectorize-slp -iwithsysroot/include/fakesdl -iwithsysroot/include/compat -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr -o /tmp/emscripten_temp_8v67r6_8/cfunc_0.o -x c cfunc.c clang -cc1 version 18.0.0 based upon LLVM 18.0.0git default target aarch64-unknown-linux-gnu ignoring nonexistent directory "/var/tmp/emsdk/upstream/emscripten/cache/sysroot/include/wasm32-emscripten" #include "..." search starts here: #include <...> search starts here: . /var/tmp/emsdk/upstream/emscripten/cache/sysroot/include/fakesdl /var/tmp/emsdk/upstream/emscripten/cache/sysroot/include/compat /var/tmp/emsdk/upstream/lib/clang/18/include /var/tmp/emsdk/upstream/emscripten/cache/sysroot/include End of search list. "/home/dev/emsdk/upstream/bin/wasm-ld" -o cfunc-asyncify.wasm /tmp/emscripten_temp_8v67r6_8/cfunc_0.o -L/var/tmp/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten -lGL -lal -lhtml5 -lstubs -lnoexit -lc -ldlmalloc -lcompiler_rt -lc++-noexcept -lc++abi-noexcept -lsockets -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr /tmp/tmpnogg_6b0libemscripten_js_symbols.so --export-if-defined=main --export-if-defined=__start_em_asm --export-if-defined=__stop_em_asm --export-if-defined=__start_em_lib_deps --export-if-defined=__stop_em_lib_deps --export-if-defined=__start_em_js --export-if-defined=__stop_em_js --export-if-defined=__main_argc_argv --export=stackSave --export=stackRestore --export=stackAlloc --export=__errno_location --export=__get_temp_ret --export=__set_temp_ret --export=malloc --export=free --export=__wasm_call_ctors --export-table -z stack-size=65536 --initial-memory=16777216 --max-memory=2147483648 --no-entry --table-base=1 --global-base=1024 "/home/dev/emsdk/upstream/bin/llvm-objcopy" cfunc-asyncify.wasm cfunc-asyncify.wasm --remove-section=.debug* --remove-section=producers "/home/dev/emsdk/upstream/bin/wasm-emscripten-finalize" --pass-arg=legalize-js-interface-exported-helpers cfunc-asyncify.wasm -o cfunc-asyncify.wasm --detect-features "/home/dev/emsdk/node/16.20.0_64bit/bin/node" /var/tmp/emsdk/upstream/emscripten/src/compiler.js /tmp/tmp_6pmyt7_.json "/home/dev/emsdk/upstream/bin/wasm-opt" --post-emscripten -O2 --low-memory-unused --asyncify --pass-arg=asyncify-imports@env.jfunc,env.invoke_*,env.__asyncjs__*,*.fd_sync,*.emscripten_promise_await,*.emscripten_idb_load,*.emscripten_idb_store,*.emscripten_idb_delete,*.emscripten_idb_exists,*.emscripten_idb_clear,*.emscripten_idb_load_blob,*.emscripten_idb_store_blob,*.emscripten_sleep,*.emscripten_wget_data,*.emscripten_scan_registers,*.emscripten_lazy_load_code,*._load_secondary_module,*.emscripten_fiber_swap,*.SDL_Delay,*.jfunc --zero-filled-memory --pass-arg=directize-initial-contents-immutable cfunc-asyncify.wasm -o cfunc-asyncify.wasm --mvp-features --enable-mutable-globals --enable-sign-ext "/home/dev/emsdk/node/16.20.0_64bit/bin/node" /var/tmp/emsdk/upstream/emscripten/tools/acorn-optimizer.js /tmp/emscripten_temp_8v67r6_8/cfunc-asyncify.js JSDCE minifyWhitespace --exportES6 -o /tmp/emscripten_temp_8v67r6_8/cfunc-asyncify.jso1.js emcc -O2 -I. -s ALLOW_MEMORY_GROWTH=1 -s ASYNCIFY_EXPORTS='["cfunc", "cloop"]' -s ASYNCIFY_IMPORTS='["jfunc"]' -s ENVIRONMENT="web,worker" -s EXPORTED_RUNTIME_METHODS='["ccall", "cwrap"]' -s INVOKE_RUN -s WASM=1 -v -s ASYNCIFY=2 -o cfunc-jspi.mjs cfunc.c --js-library libfoo.js "/home/dev/emsdk/upstream/bin/clang" --version "/home/dev/emsdk/upstream/bin/clang" -target wasm32-unknown-emscripten -fignore-exceptions -fvisibility=default -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr -DEMSCRIPTEN -Werror=implicit-function-declaration --sysroot=/var/tmp/emsdk/upstream/emscripten/cache/sysroot -Xclang -iwithsysroot/include/fakesdl -Xclang -iwithsysroot/include/compat -O2 -I. -v cfunc.c -c -o /tmp/emscripten_temp_jls2o26n/cfunc_0.o clang version 18.0.0 (https://github.com/llvm/llvm-project 21030b9ab4487d845e29792063f5666d8c4b8e09) Target: wasm32-unknown-emscripten Thread model: posix InstalledDir: /home/dev/emsdk/upstream/bin (in-process) "/var/tmp/emsdk/upstream/bin/clang-18" -cc1 -triple wasm32-unknown-emscripten -emit-obj -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name cfunc.c -mrelocation-model static -mframe-pointer=none -ffp-contract=on -fno-rounding-math -mconstructor-aliases -target-cpu generic -debugger-tuning=gdb -fdebug-compilation-dir=/home/dev/browser-test-cases/jspi -v -fcoverage-compilation-dir=/home/dev/browser-test-cases/jspi -resource-dir /var/tmp/emsdk/upstream/lib/clang/18 -D EMSCRIPTEN -I . -isysroot /var/tmp/emsdk/upstream/emscripten/cache/sysroot -internal-isystem /var/tmp/emsdk/upstream/lib/clang/18/include -internal-isystem /var/tmp/emsdk/upstream/emscripten/cache/sysroot/include/wasm32-emscripten -internal-isystem /var/tmp/emsdk/upstream/emscripten/cache/sysroot/include -O2 -Werror=implicit-function-declaration -ferror-limit 19 -fvisibility=default -fgnuc-version=4.2.1 -fignore-exceptions -fcolor-diagnostics -vectorize-loops -vectorize-slp -iwithsysroot/include/fakesdl -iwithsysroot/include/compat -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr -o /tmp/emscripten_temp_jls2o26n/cfunc_0.o -x c cfunc.c clang -cc1 version 18.0.0 based upon LLVM 18.0.0git default target aarch64-unknown-linux-gnu ignoring nonexistent directory "/var/tmp/emsdk/upstream/emscripten/cache/sysroot/include/wasm32-emscripten" #include "..." search starts here: #include <...> search starts here: . /var/tmp/emsdk/upstream/emscripten/cache/sysroot/include/fakesdl /var/tmp/emsdk/upstream/emscripten/cache/sysroot/include/compat /var/tmp/emsdk/upstream/lib/clang/18/include /var/tmp/emsdk/upstream/emscripten/cache/sysroot/include End of search list. emcc: warning: -sASYNCIFY=2 (JSPI) is still experimental [-Wexperimental] "/home/dev/emsdk/upstream/bin/wasm-ld" -o cfunc-jspi.wasm /tmp/emscripten_temp_jls2o26n/cfunc_0.o -L/var/tmp/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten -lGL -lal -lhtml5 -lstubs -lnoexit -lc -ldlmalloc -lcompiler_rt -lc++-noexcept -lc++abi-noexcept -lsockets -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr /tmp/tmpddcpx76vlibemscripten_js_symbols.so --export-if-defined=main --export-if-defined=__start_em_asm --export-if-defined=__stop_em_asm --export-if-defined=__start_em_lib_deps --export-if-defined=__stop_em_lib_deps --export-if-defined=__start_em_js --export-if-defined=__stop_em_js --export-if-defined=__main_argc_argv --export=stackSave --export=stackRestore --export=stackAlloc --export=__errno_location --export=__get_temp_ret --export=__set_temp_ret --export=malloc --export=free --export=__wasm_call_ctors --export-table -z stack-size=65536 --initial-memory=16777216 --max-memory=2147483648 --no-entry --table-base=1 --global-base=1024 "/home/dev/emsdk/upstream/bin/llvm-objcopy" cfunc-jspi.wasm cfunc-jspi.wasm --remove-section=.debug* --remove-section=producers "/home/dev/emsdk/upstream/bin/wasm-emscripten-finalize" --dyncalls-i64 --pass-arg=legalize-js-interface-exported-helpers cfunc-jspi.wasm -o cfunc-jspi.wasm --detect-features "/home/dev/emsdk/node/16.20.0_64bit/bin/node" /var/tmp/emsdk/upstream/emscripten/src/compiler.js /tmp/tmprhq8bvm5.json "/home/dev/emsdk/upstream/bin/wasm-opt" --post-emscripten -O2 --low-memory-unused --jspi --pass-arg=jspi-imports@env.jfunc,env.invoke_*,env.__asyncjs__*,*.fd_sync,*.emscripten_promise_await,*.emscripten_idb_load,*.emscripten_idb_store,*.emscripten_idb_delete,*.emscripten_idb_exists,*.emscripten_idb_clear,*.emscripten_idb_load_blob,*.emscripten_idb_store_blob,*.emscripten_sleep,*.emscripten_wget_data,*.emscripten_scan_registers,*.emscripten_lazy_load_code,*._load_secondary_module,*.emscripten_fiber_swap,*.SDL_Delay,*.jfunc --pass-arg=jspi-exports@cfunc,cloop,main,__main_argc_argv,_ZN10emscripten8internal5async* --zero-filled-memory --pass-arg=directize-initial-contents-immutable cfunc-jspi.wasm -o cfunc-jspi.wasm --mvp-features --enable-mutable-globals --enable-sign-ext --enable-reference-types "/home/dev/emsdk/node/16.20.0_64bit/bin/node" /var/tmp/emsdk/upstream/emscripten/tools/acorn-optimizer.js /tmp/emscripten_temp_jls2o26n/cfunc-jspi.js JSDCE minifyWhitespace --exportES6 -o /tmp/emscripten_temp_jls2o26n/cfunc-jspi.jso1.js
kripken commented 8 months ago

One thing to worry about here is on-stack replacement. If you call the function time() many times, do the results change? (Without OSR, the issue is that once inside a loop, the VM must do on-stack replacement to replace the baseline or interpreter code with fully-optimized code, and if that loop only happens once, there is no such chance and it runs slowly. That is the case in V8. Calling time() again would give a chance to run the optimized version.)

also cc @brendandahl

rhashimoto commented 8 months ago

Thanks for your insight @kripken! Sorry, but I may not be understanding quite what you're suggesting.

My C code is here (it's very short). jfunc() is the asynchronous JavaScript function, cfunc() just calls jfunc(), and cloop() iterates cfunc(). I'm comparing a call to cloop() from JavaScript with iterating over calls to cfunc() from JavaScript.

I changed my code locally to replace the JavaScript loop timing with this:

  async function timeJStoCtoJS() {
    const start = performance.now();
    for (let i = 0; i < ITERATIONS; ++i) {
      await module.ccall('cfunc', 'number', ['number'], [2.0], { async: true });
    }
    const end = performance.now();
    const duration = Math.trunc(end - start) / 1000;
    postMessage(`${build} ${ITERATIONS} iterations in JS ${duration} seconds`);
  }

  await timeJStoCtoJS();
  await timeJStoCtoJS();
  await timeJStoCtoJS();
  await timeJStoCtoJS();
  await timeJStoCtoJS();

Does that accomplish what you are suggesting?

The results of this on my machine are:

asyncify 10000 iterations in C 0.01 seconds
asyncify 10000 iterations in JS 0.01 seconds
asyncify 10000 iterations in JS 0.006 seconds
asyncify 10000 iterations in JS 0.005 seconds
asyncify 10000 iterations in JS 0.005 seconds
asyncify 10000 iterations in JS 0.004 seconds
jspi 10000 iterations in C 0.004 seconds
jspi 10000 iterations in JS 12.56 seconds
jspi 10000 iterations in JS 12.542 seconds
jspi 10000 iterations in JS 13.139 seconds
jspi 10000 iterations in JS 12.551 seconds
jspi 10000 iterations in JS 12.57 seconds

It does look like the timings improve for Asyncify (that progression is pretty consistent over repeated runs), but are little changed for JSPI. I'm currently running 122.0.6251.0 (Official Build) canary (arm64) on a M2 Mac.

kripken commented 8 months ago

Thanks, yes, that's what I meant. Ok, I think that confirms that OSR is not a factor here, which is good to rule out.

kripken commented 8 months ago

@brendandahl do you know who would be good to cc on that chromium bug?

brendandahl commented 8 months ago

I'm guessing the performance difference is coming from the current cost of entering a JSPI'd module many times. IIRC a new stack is allocated each time. Performance within the module should have no overhead, but entering/exiting many times is currently expensive.

As for cc, I think Ilya or Thibaud would know more.

kripken commented 8 months ago

Interesting, thanks. Ok, I think we can cc them here actually: @thibaudmichaud @spy - last comment may already answer the question, but please confirm if you can, thanks!

rhashimoto commented 3 months ago

JSPI performance from Javascript looks to be fixed on recent Chrome Canary builds with the checkin to reuse stacks.