ControlCplusControlV / Scribe

Minimal Yul Transpilation to the Miden VM
GNU General Public License v3.0
51 stars 4 forks source link

Dynamic re-ordering of vars to optimize stack usage #49

Open ControlCplusControlV opened 1 year ago

ControlCplusControlV commented 1 year ago

Dynamic re-ordering of vars to optimize stack usage

A big cost when doing loop-heavy stuff is ensuring that the stack is in the same order as it was before the loop. If the vars [a, b, c] are in the stack before the loop, then at the end of the loop they're in the order [b, a, c], we have to shift the stack around to accomodate that.

For example, here's some fibonnaci code and the output:

let c:u32 := 0
let b:u32 := 1
let a:u32 := 0

for { let i:u32 := 0 } lt(i, 10) { i := add(i, 1)}
{
    c := add(a,b)
    a := b
    b := c
}
b

Output:

begin
    # Assigning to c #
        # u32 literal 0 #
        push.0

    # Assigning to b #
        # u32 literal 1 #
        push.1

    # Assigning to a #
        # u32 literal 0 #
        push.0

    repeat.10
        # Assigning to c #
            # add() #
            # pushing a to the top #
                dup.0
            # pushing b to the top #
                dup.2
            add
        # Assigning to a #
            # pushing b to the top #
                dup.2
        # Assigning to b #
            # pushing c to the top #
                dup.1
        # targeting the stack #
        # pushing c to the top #
            movup.2
        # pushing b to the top #
            swap
        # pushing a to the top #
            movup.2
    end
    # pushing b to the top #
        dup.1
end

Note the "targeting the stack" thing, where we have to re-order the elements on the stack, to be the same as before the loop, for the next loop to work correctly. We have some extremely naive cost-estimation code that just counts the number of instructions that will be outputed (multiplied by N when in a repeat.N instruction), the cost for this code is 124.

If you just re-order the initial vars a bit, like this:

let c:u32 := 0
let a:u32 := 0
let b:u32 := 1

Then you get a different output at the end of the loop:

use.std::math::u256

begin
    # Assigning to c #
        # u32 literal 0 #
        push.0

    # Assigning to a #
        # u32 literal 0 #
        push.0

    # Assigning to b #
        # u32 literal 1 #
        push.1

    repeat.10
        # Assigning to c #
            # add() #
            # pushing a to the top #
                dup.1
            # pushing b to the top #
                dup.1
            add
        # Assigning to a #
            # pushing b to the top #
                dup.1
        # Assigning to b #
            # pushing c to the top #
                dup.1
        # targeting the stack #
        # No need to target the stack, same order #
    end
    # pushing b to the top #
        dup.0
end

Note the "No need to target the stack, same order" thing at the end of the loop. The (again, extremely naive) estimated cost for this is 94, which is 25% less, but the real savings would be even higher, because shifting stuff around on the stack is fairly expensive, afaik. So once the cost estimation stuff gets more accurate that should be even better.

So instead of having to move yul code around by hand to trigger this optimization, we can do some sort of dynamic programming approach that tries every order.

This wouldn't just have cost savings for loops like above. There's a lot of stack re-ordering, pushing to memory, popping from memory, etc. that happens, and varying the order of things would affect all of that.