WebAssembly / binaryen

Optimizer and compiler/toolchain library for WebAssembly
Apache License 2.0
7.4k stars 729 forks source link

Seggestion. More advanced caching for globals #4065

Open MaxGraey opened 3 years ago

MaxGraey commented 3 years ago

This is a fairly common pattern, which looks suboptimal for performance:

global.get $sp
i32.const 8
i32.sub
global.set $sp 
global.get $sp      ;;  redundant get after set
i32.const 15340
i32.lt_s
if
  ...
  unreachable       ;; or early return
end
global.get $sp      ;;  redundant get after branch

will be great optimize to something like this which may optimize more in next passes:

local $sp_tmp
global.get $sp
local.tee $sp_tmp
i32.const 8
i32.sub
local.tee $sp_tmp
i32.const 15340
i32.lt_s
if
  local.get $sp_tmp
  global.set $sp
  ...
  unreachable
end
local.get $sp_tmp

Another similar case:

global.get $sp
i32.const 0
i32.store
block $label
  ...
  if
    ;; no global.set before
    global.get $sp  ;; redundant global.get
    i32.const 4
    i32.add
    ...
  end
end

witch cache:

local $sp_tmp
global.get $sp
local.tee $sp_tmp    ;; caching
i32.const 0
i32.store
block $label
  ...
  if
    local.get $sp_tmp ;; replaced to "local.get"
    i32.const 4
    i32.add
    ...
  end
end

But usually this increase size

kripken commented 3 years ago

Makes sense. This is the sort of pattern I hope to get optimized by DeadStoreElimination and extensions to it (store-load forwarding, etc.). That has begun in #3858

MaxGraey commented 3 years ago

Is something blocking #3858 now? Or do you need to double-check everything more carefully?

kripken commented 3 years ago

It needs to be checked and reviewed carefully, and it's large - that's the main slowdown I think.

Actually, I wonder if maybe we can start with something simpler - this type of optimization in a single basic block should be easy to do. And likely it would give most of the benefit. We could extend it later to the whole CFG depending on how much is left. I'll look into that.

kripken commented 3 years ago

https://github.com/WebAssembly/binaryen/pull/4079 should help here.

MaxGraey commented 3 years ago

Thanks!

MaxGraey commented 3 years ago

4079 should help here.

Btw it make caching only for more that one global reads (usages), right? For single read it's unnecessary. If it's part of CSE I guess that's really going on but if it's handling somehow specially...

kripken commented 3 years ago

It looks for expressions that appear more than once. So it should not do anything for a single read (unless there's a bug).