Open dcodeIO opened 6 years ago
This would help in those cases, as tee_local
would be smaller, as you said, and also operations on locals will normally be implemented in registers with a lot less memory reading and writing, so it would be faster.
But to optimize this, it seems like the conditions are: the global is only used in one function, and that function will only be called at most once. Which can really only apply to the start
method. So this would be fairly rare, I suspect. Still, worth doing.
But to optimize this, it seems like the conditions are: the global is only used in one function, and that function will only be called at most once. Which can really only apply to the start method.
Yeah, or if one (or multiple) functions use the (same) global, but (each of them) reinitializes it anyway. Might not be a size-win, though, when its multiple functions because a single global declaration is replaced with multiple locals.
There's a well-known benchmark in C that features static
variables that are actually used without regard to their static
-ness; they can be fully converted to automatic, local variables for a performance win. In a former life, I wrote an optimization pass to do this. The consensus from that thread was that some form of PRE would do most of the work for this optimization, and that you'd really only need some sort of cleanup pass to recognize stores as dead.
This case sounds pretty similar, though perhaps with JS, the optimization pass would trigger a bit more often?
Yeah, good point. Everything except for the final stores could be optimized using general methods (eventually; we don't have proper PRE or GVN yet). And a final store to a global is something we could work to optimize out (a store to memory is harder because they can trap, so we can't remove one unless we can prove it won't trap).
I'm not sure how important it is to optimize globals in this way, though. In C/C++ output they are very rare, really only the stack pointer will use one. But maybe in AssemblyScript and others this may be common?
Everything except for the final stores could be optimized using general methods (eventually; we don't have proper PRE or GVN yet)
That reminds me of another thing. Accesses to this
etc. become loads and stores in my case. You mentioned registers earlier, so might it also make sense to lower these to locals except for the final store as well?
class Foo {
baz: i32 = 24; // is accessed by offset 0 in load/store
bar(): void { // becomes a wasm function
this.baz = this.baz + 42; // store(this, 0, load(this. 0) + 42)
return this.baz; // load(this, 0)
}
}
But maybe in AssemblyScript and others this may be common?
In AssemblyScript's case it's relatively common because of the implicit start function that wraps all the top-level logic of all source files, combined with the relatively specific fact that the compiler doesn't have all the information beforehand.
Not sure yet where to draw the line between what the AssemblyScript compiler should do (here: do one more pass over the AST and keep track of things), and what's better to do in Binaryen.
Yeah, exactly. In that example, we can save the loaded value to a local and use it instead of the second load. A gvn pass could do this, and would be a generalization of the currently very simple local-cse and redundant-set-elimination passes.
I have the following scenario
as a result of compiling and
-O3
optimizingHere,
aVar
is compiled in the context of the implicitstart
function and thus preemptively compiled as a global because it isn't known beforehand whether it is used within any other top-level function, likeaFunc
.When not used in a top-level function, this results in quite a few globals that are only used by one function, that is the
start
function here, and could be optimized away to become locals instead because, here, the start function is called exactly once. Other such functions could, potentially always reinitialize the global.Unfortunately I have yet to find another scenario where this makes any sense, but while thinking about it I figured that replacing such globals with locals might as well become a Binaryen pass, that is if there is any performance- or size-win in moving non-exported-globals to locals? Another thing I thought of is that such a variable could take advantage of
tee_local
.