bytecodealliance / wasm-micro-runtime

WebAssembly Micro Runtime (WAMR)
Apache License 2.0
4.84k stars 618 forks source link

[Fast Interpreter] Slow performance when handling complicated arithmetic expression in loop #2167

Open hungryzzz opened 1 year ago

hungryzzz commented 1 year ago

Description

Hi, I run the following attached cases in different Wasm runtimes(after being compiled by Emscripten), and I find some performance differences between wamr(fast-interp) and wasm3.

The execution time(collected by perf-tool, probe begins when starting to execute the wasm code(wasm_call_function in wamr) and end in sched:sched_process_exit) in wamr(fast-interp) is 2x slower than which in wasm3.

flops-8 flops-5 flops-4 flops-3
wamr(fast-interp) 9597840.99 us 8475859.04 us 4882332.11 us 10700224.61 us
wasm3 4401260.85 us 4105807.93 us 2574588.03 us 5633284.86 us
wamr(AOT) 879322.56 us 880584.59 us 418496.13 us 934592.44 us

I run other test cases on such runtimes, the average execution time on wamr(fast-interp) is 1.2x times faster than on wasm3. I also see the previous report in https://github.com/bytecodealliance/wasm-micro-runtime/wiki/Performance and find the similar results. So maybe the above results are a little strange.

Then I look though the above cases, and I find they are all about the complicated arithmetic expressions in loop. So I guess maybe wamr(fast-interp) suffers from slow performance when handling such cases.

Hardware & OS

Emscripten

Wasm runtime version

Repreduce

  1. Compile the above C case using Emscripten emcc -sENVIRONMENT=shell -O2 -s WASM=1 -s TOTAL_MEMORY=512MB flops.c -o flops.wasm

  2. Execute the wasm file in different wasm runtimes and collect the execution time, all the compilation and execution options are default.

c.zip wasm.zip

TianlongLiang commented 1 year ago

Hi, maybe you are testing classic interpreter mode. But you are right about those cases. Interpreters are not great for such cases. The fast interpreter is also about 10x times compared to AOT modes. I tested other running modes too, Fast JIT and LLVM JIT can handle those cases relatively well, about 3x times/1.1x times compared to AOT

hungryzzz commented 1 year ago

Hi, I use the fast interpreter mode to run those cases, so I compare the execution results with Wasm3, which is also an fast interpreter to run Wasm. I also build the Wamr with classic mode(-DWAMR_BUILD_FAST_INTERP=0), the execution time in classic mode is 2 or more times than the fast-interp mode.