Non-local constant folding

ggreif commented 2 years ago

Here is a speculative idea. Consider this Motoko snippet

shiftRight(someNat, 31)

Due to stack-machine conventions we compile someNat (which could be a compound expression too) then 31 (pushing it on the stack), and finally we insert the Wasm template for Nat shift (compile_rsh).

Naturally we will often have a constant as the second argument, and when this is the case, the codegen template's compact Nat arm will always produce 0 when that argument is >= 30. The complicated logic can be erased totally. Since the backend is simple minded, we'd have to analyse IR to spot such opportunities, and it would add much bloat to the (already huge) backend. But we have a secret weapon: the instrList.ml module defines Wasm instructions as difference lists, so there is some flexibility to insert constness hints (or other clues) and when the difference list gets expanded, such code could run, analysing (downstream or maybe upstream) instructions and cut away redundancies from the instruction stream.

It would be somewhat more expensive to run an abstract interpreter on the Wasm, but it could be done behind an optimisation flag. I conjecture that we could optimise this way deeper (having domain-specific knowledge) than an off-the shelf Wasm optimiser.

nomeata commented 2 years ago

Since the backend is simple minded,

hear hear! Maybe it better be, but at almost 10k loc, it unfortunately can’t be called simple minded.

This is about Nat, not Nat32, right? (I think for Nat32 the backend would already put the unboxed 31 onto the stack, so it wouldn't do the is-it-boxed-check.) Do we even have shift for Nat?

I’d be weary to move even more complexity into the Wasm instruction list. It being just a wasm instruction list is a nice clean reliable abstraction.

Especially, if for these micro-optimizations when there isn’t evidence that this is a hot issue that’s worth spending over overspent complexity budget on…

ggreif commented 2 years ago

hear hear! Maybe it better be, but at almost 10k loc, it unfortunately can’t be called simple minded.

The compilation strategy like this

  | OtherPrim "rshd", [e1; e2] ->
    SR.Vanilla,
    compile_exp_vanilla env ae e1 ^^
    compile_exp_as env ae SR.UnboxedWord32 e2 ^^
    BigNum.compile_rsh env

is what I would call simple-minded. The backend in its entirety surely is sophisticated, but here we have very little knobs and the code coming out can be pretty inefficient.

Do we even have shift for Nat?

Soon! (#3112)

Especially, if for these micro-optimizations when there isn’t evidence that this is a hot issue that’s worth spending over overspent complexity budget on…

Depends on what the users will need in the future. It feels better to have a strategy that can (almost transparently) perform better-than-peephole optimisations.

I agree that this is utopia for now.

crusso commented 2 years ago

Doesn't the const analysis give us most of what we need?

nomeata commented 2 years ago

Ah, the second parameter is a Nat32? Then we could indeed use the const analysis result to generate different code for different constants. Not utopia, but not sure if it's worth it.

ggreif commented 2 years ago

It appears that there is an SMT-based framework for verifying Wasm-level transformations: https://antonxue.github.io/files/papers/vmcai2021-whisk.pdf (in ISBN 978-3-030-67066-5).

dfinity / motoko

Non-local constant folding #3284