Open wjr-z opened 1 year ago
Thanks for the report! Would you be able to share a wasm file or an example loop in source code to help reproduce this locally?
Thanks for the report! Would you be able to share a wasm file or an example loop in source code to help reproduce this locally?
Thank you for your reply. In fact, I am actively searching for the reason . This is link to box_seal. wasm https://github.com/jedisct1/webassembly-benchmarks/blob/master/2021-Q1/wasm/box_seal.wasm Then, this is the code for the example loop.
(module
(export "_start" (func $_start))
(func $_start (; 0 ;)
(local $i i32)
(local $i2 i32)
i32.const 0
local.set $i
loop $loop
i32.const 0
local.set $i2
loop $loop2
local.get $i2
i32.const 1
i32.add
local.set $i2
local.get $i2
i32.const 80000
i32.lt_s
br_if $loop2
end $loop2
local.get $i
i32.const 1
i32.add
local.set $i
local.get $i
i32.const 40000
i32.lt_s
br_if $loop
end $loop
)
)
Thanks! Could you detail a bit more what you mean by "manually fix the issue with epoch (Unstable), the cost was only less than 7%"?
Looking at the disassembly it's not obvious to me what the issue is and how such a large win could be gained, so I'm curious how you were able to achieve it!
Thanks! Could you detail a bit more what you mean by "manually fix the issue with epoch (Unstable), the cost was only less than 7%"?
Looking at the disassembly it's not obvious to me what the issue is and how such a large win could be gained, so I'm curious how you were able to achieve it!
Unfortunately, the data on the server was lost. I'll try to reproduce it next week.
At present, there seem to be serious issues with the epoch mechanism and register usage。 For example, the following is a simple comparison of native and epoch assemblies for a double loop
wasmtiem release-13.0.0
native :
epoch :
The above example assigns some registers, such as ax and cx, to the check block of epoch. Actually, this is just a simple example, and more complex workloads have a significant performance impact on the box_seal.wasm, the cost has reached 25%! And after trying to manually fix the issue with epoch (Unstable), the cost was only less than 7%. Especially for inner and outer loops, the outer loop uses r10 for storage, but the inner loop uses r11, which I cannot understand