bytecodealliance / wasm-micro-runtime

WebAssembly Micro Runtime (WAMR)
Apache License 2.0
4.93k stars 624 forks source link

Exploring Architecture-Specific Optimization in WAMR Without Altering Bytecode Generation #3243

Open OkannShn opened 7 months ago

OkannShn commented 7 months ago

I am exploring the potential for optimizing the WAMR runtime performance by leveraging architecture-specific instructions. The key challenge is to achieve this while maintaining the universal applicability of the bytecode, ensuring it remains architecture-agnostic and can be executed on any platform, including ARM, Xtensa, and others.

Currently, WAMR generates bytecode that is generic and can run across different architectures. This approach is excellent for portability but doesn't take full advantage of the unique capabilities and instructions specific to each architecture, which could potentially enhance performance.

I am curious if there is a way to incorporate these architecture-specific optimizations in a manner that does not require changes to the bytecode generation process. I am not sure where to start right now.

Any insights, suggestions, or discussions on how we might approach this would be greatly appreciated. I believe this could be a significant step forward in optimizing WAMR for specific hardware, without sacrificing its cross-platform utility.

Thank you for any suggestions.

lum1n0us commented 7 months ago

I honestly don't have a definitive answer but an opinion, which I can share. My opinion may be wrong. Please tell me if it is. And @wenyongh @XuJun2019 @TianlongLiang , if you agree or disagree, please jump in.

If this is for interpreter mode, need to make sure using architecture-specific instructions in bytecode handlers. quit simple and straight, right?

But it starts to become complex when talking about jit mode and aot mode. In WAMR, we depends on LLVM to do optimization and code generations. In a nutshell, we transfer Wasm bytecodes to LLVM IR firstly and let LLVM finish the show. So, about "architecture-specific instructions*, I think there are some directions:

I hope these thoughts helpful? Please let me know what you think.

wenyongh commented 7 months ago

Yes, not sure what WAMR generates bytecode means here? Is it the pre-compiled bytecode generated by wasm loader for fast-interp, or the LLVM IR (or machine code) generated by the aot compiler?

For interpreter, I have no idea how to generate architecture-specific bytecodes, since they are general bytecodes currently. Maybe like @lum1n0us mentioned, architecture-specific instructions in bytecode handlers is a good way, another way may be to implement the interpreter with assembly code to improve the performance, but it is really complex.

For aot compiler, per my understanding, there may be three ways: 1) generating architecture-specific LLVM IR, for example, register translation callback for each wasm opcode, if the callback is found, then the aot compiler calls it to generate the LLVMR IRs for that opcode, otherwise, calls the common translation function to generate the LLVM IRs

2) adding architecture-specific LLVM passes, allow to register new passes (e.g. from wamr built-in implementation or or from .so file) and apply them, and note that wamrc already has option --enable-llvm-passes=<passes> now

3) using architecture-specific codegen, no idea how to affect the codegen process yet, since now aot compiler just calls LLVMTargetMachineEmitToMemoryBuffer or LLVMTargetMachineEmitToFile to get the object file which contains the machine code.