WebAssembly / binaryen

Optimizer and compiler/toolchain library for WebAssembly
Apache License 2.0
7.42k stars 734 forks source link

Global to local optimization idea #1371

Open dcodeIO opened 6 years ago

dcodeIO commented 6 years ago

I have the following scenario

(module
 (type $v (func))
 (global $tests/compiler/whatif/aVar (mut i32) (i32.const 0))
 (memory $0 1)
 (export "memory" (memory $0))
 (start $start)
 (func $start (; 0 ;) (type $v)
  (set_global $tests/compiler/whatif/aVar
   (i32.const 42)
  )
 )
)

as a result of compiling and -O3 optimizing

let aVar: i32 = 0;

aVar = 42;

function aFunc(): void {
  // aVar = 24;
}

Here, aVar is compiled in the context of the implicit start function and thus preemptively compiled as a global because it isn't known beforehand whether it is used within any other top-level function, like aFunc.

When not used in a top-level function, this results in quite a few globals that are only used by one function, that is the start function here, and could be optimized away to become locals instead because, here, the start function is called exactly once. Other such functions could, potentially always reinitialize the global.

Unfortunately I have yet to find another scenario where this makes any sense, but while thinking about it I figured that replacing such globals with locals might as well become a Binaryen pass, that is if there is any performance- or size-win in moving non-exported-globals to locals? Another thing I thought of is that such a variable could take advantage of tee_local.

kripken commented 6 years ago

This would help in those cases, as tee_local would be smaller, as you said, and also operations on locals will normally be implemented in registers with a lot less memory reading and writing, so it would be faster.

But to optimize this, it seems like the conditions are: the global is only used in one function, and that function will only be called at most once. Which can really only apply to the start method. So this would be fairly rare, I suspect. Still, worth doing.

dcodeIO commented 6 years ago

But to optimize this, it seems like the conditions are: the global is only used in one function, and that function will only be called at most once. Which can really only apply to the start method.

Yeah, or if one (or multiple) functions use the (same) global, but (each of them) reinitializes it anyway. Might not be a size-win, though, when its multiple functions because a single global declaration is replaced with multiple locals.

froydnj commented 6 years ago

There's a well-known benchmark in C that features static variables that are actually used without regard to their static-ness; they can be fully converted to automatic, local variables for a performance win. In a former life, I wrote an optimization pass to do this. The consensus from that thread was that some form of PRE would do most of the work for this optimization, and that you'd really only need some sort of cleanup pass to recognize stores as dead.

This case sounds pretty similar, though perhaps with JS, the optimization pass would trigger a bit more often?

kripken commented 6 years ago

Yeah, good point. Everything except for the final stores could be optimized using general methods (eventually; we don't have proper PRE or GVN yet). And a final store to a global is something we could work to optimize out (a store to memory is harder because they can trap, so we can't remove one unless we can prove it won't trap).

I'm not sure how important it is to optimize globals in this way, though. In C/C++ output they are very rare, really only the stack pointer will use one. But maybe in AssemblyScript and others this may be common?

dcodeIO commented 6 years ago

Everything except for the final stores could be optimized using general methods (eventually; we don't have proper PRE or GVN yet)

That reminds me of another thing. Accesses to this etc. become loads and stores in my case. You mentioned registers earlier, so might it also make sense to lower these to locals except for the final store as well?

class Foo {
  baz: i32 = 24; // is accessed by offset 0 in load/store
  bar(): void { // becomes a wasm function
    this.baz = this.baz + 42; // store(this, 0, load(this. 0) + 42)
    return this.baz; // load(this, 0)
  }
}

But maybe in AssemblyScript and others this may be common?

In AssemblyScript's case it's relatively common because of the implicit start function that wraps all the top-level logic of all source files, combined with the relatively specific fact that the compiler doesn't have all the information beforehand.

Not sure yet where to draw the line between what the AssemblyScript compiler should do (here: do one more pass over the AST and keep track of things), and what's better to do in Binaryen.

kripken commented 6 years ago

Yeah, exactly. In that example, we can save the loaded value to a local and use it instead of the second load. A gvn pass could do this, and would be a generalization of the currently very simple local-cse and redundant-set-elimination passes.