Open csegarragonz opened 1 year ago
Hi, WAMR and WAVM are LLVM-based, wasmtime is cranelift based, it really takes more time for the former to compile wasm files. And WAMR uses llvm new pass manager and may apply more optimizations than WAVM, so it may take more time to compile wasm file than it. There may be some methods to reduce the compile time for wamrc:
wamrc --size-level=2/1
wamrc --opt-level=2
verify_module
and return true directly
https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/core/iwasm/compilation/aot_compiler.c#L2604-L2615@csegarragonz Recently we implemented the segue optimization for LLVM AOT/JIT, see #2230, normally (for many cases) it can improve the performance, reduce the compilation time of AOT/JIT and reduce the size of AOT/JIT code generated. Currently it supports linux platform and linux-sgx platform on x86-64, could you have a try? The usage is:
wamrc --enable-segue
or wamrc --enable-segue=<flags>
iwasm --enable-segue
or iwasm --enable-segue=<flags>
(iwasm is built with LLVM JIT enabled)
flags
can be:
i32.load, i64.load, f32.load, f64.load, v128.load,
i32.store, i64.store, f32.store, f64.store, v128.store
Use comma to separate them, e.g. --enable-segue=i32.load,i64.store
.
Hey @wenyongh thanks for pointing this out!
Just to double check, will this optimisations benefit me if I am using x86-64 on linux with HW bound checks enabled?
As far as I can tell, bound checks weren't performed anyway, and were delegated to the OS by placing the linear memory at the begining of a contiguous patch of 8GB of virtual memory and protecting memory pages? Please correct me if I am wrong!
(Not for SGX, I understand the segue optimisation could benefit my SGX use cases)
Yes, it may benefit no matter the --bounds-checks=1
is added for wamrc or not. The memory access boundary check in the aot code only depends on the i + memarg.offset
(i is popped from stack, memarg.offset is encoded in bytecode), it doesn't related to the base address of linear memory.
Normally the compilation time and the binary size can be reduced since the optimization simplifies the LLVM IRs to load/store the linear memory and decreases the size of load/store instructions. The performance may be degraded in some cases, we found that some LLVM optimizations may not take effect if the optimization is enabled, and it depends on which flags are enabled, for example for CoreMark workload, the performance gets worse if using warmc --enable-segue
while gets better if using wamrc --enable-segue=i32.store
.
Hi,
I have been experiencing some very slow code generation times for large WASM files.
I include a little benchmark I have done with this WASM file: large_code.zip. I compare:
wamrc
from the currentmain
tip (build inRelease
mode)WAVM
from our forkwasmtime
: using v7.0.0For each, I include the instructions to build, the command to generate the machine code, and the time it took.
wamrc
:Build
wamrc
withCMAKE_BUILD_TYPE=Release
from the latest commit.WAVM
:Already built in the docker image in the previous command, you may find the
Dockerfile
here.wasmtime
:Install using:
Admittedly, I am not very familiar with
wasmtime
nor have I any idea why is it so much faster. I suspect I am doing something wrong. That being said,wasmtime
uses a different code generator, but WAVM is also LLVM-based, so how come it is more than two times faster?NB: this results are specific to my machine but, at least for the WAMR/WAVM comparison, I have seen consistent numbers in a variety of Intel x86 CPUs.
NB2: the attached WASM file contains a lot of custom native symbols only defined in our embedder, so you can not run it with
iwasm
. I thought it did not really matter to get the point across.