WebAssembly / binaryen

Optimizer and compiler/toolchain library for WebAssembly
Apache License 2.0
7.4k stars 729 forks source link

miss-optimization for continuous add #6831

Open HerrCai0907 opened 1 month ago

HerrCai0907 commented 1 month ago

These assemblyscript code

 (func $assembly/index/_start (result i32)
  global.get $assembly/index/a0
  global.get $assembly/index/a1
  i32.add
  global.get $assembly/index/a2
  i32.add
  global.get $assembly/index/a3
  i32.add
  global.get $assembly/index/a4
  i32.add
  global.get $assembly/index/a5
  i32.add
  global.get $assembly/index/a6
  i32.add
  return
 )

will be optimized to

 (func $assembly/index/_start (result i32)
  global.get $assembly/index/a6
  global.get $assembly/index/a5
  global.get $assembly/index/a4
  global.get $assembly/index/a3
  global.get $assembly/index/a2
  global.get $assembly/index/a0
  global.get $assembly/index/a1
  i32.add
  i32.add
  i32.add
  i32.add
  i32.add
  i32.add
 )

Is there any benefit to exchange the operand of i32.add. the optimized version looks like cause higher register pressure for JIT / AOT wasm runtime.

kripken commented 1 month ago

See a related previous discussion here:

https://github.com/WebAssembly/binaryen/issues/5088#issuecomment-1262560728

Over there, we did not get a clear answer from VM people that one or the other order was better, so we did not change anything. But it may be worth revisiting this. If we have benchmarks that show another order is better, we can flip it.

tlively commented 1 month ago

See also https://github.com/llvm/llvm-project/issues/98631 and https://github.com/llvm/llvm-project/pull/97283 for discussions of a case where preferring a shallower stack has significant performance benefits.