Floats let bindings outside of literal arrays, literal records and saturated constructors

mikesol commented 1 year ago

I floated everything that IMO can float. A few files saw some tweaks with no benefit or disadvantage in readability, and a few are a bit more readable. In general, floating makes reasoning about BackendSemantics easier because if you're operating on semantic X and it's composed of Y and Z, there's less of a chance now that Y and Z have a let binding in them, so you aren't blocked by the closure of a let's body.

natefaubion commented 1 year ago

My overall opinion of this PR currently is that it seems to be a little bit of a wash when it comes to actual codegen. I don't agree that the results in the snapshots are inherently simpler, and at scale I fear it might create significant divergence from source that isn't advantageous. There is some additional context added, which adds a bit of complexity and I feel the implementation as-is reduces the legibility of eval (for me at least). With that, I'm hesitant to merge this without clear understanding of what the advantages are for optimizations, because if that exists it's not captured by any existing snapshots. If so, this PR might warrant additional snapshots to illustrate the effectiveness of the transformation.

There is also a rewrite that reassociates lets bottom-up in build (via RewriteLetAssoc). That means we do floating in both eval and build, but I wonder if that's necessary (can they be consolidated?), if one or the other is more necessary, and if bottom-up (build) vs top-down (eval) makes any difference.

mikesol commented 1 year ago

Thanks a lot for the review.

I'll add an additional snapshot that captures the optimizations. The optimizations definitely kick in on my inlined-recursion-try branch from which this PR was carved out, but I haven't yet landed on a simple test that shows them off.

I think once that is there, we can evaluated which of these changes make sense. The only one currently in my inlined-recursion-try branch is the one for literal arrays, but the reason that I put them all here (literal records, ctors, app, etc) is so that we can discuss them and decide which ones make the cut.

I don't see any of these floatings as inherently more or less advantageous than the others. All of them can potentially float a let above something being deref'd, which increases the chances that a prim op will be optimized out (again, I need to write a good test for this). But if some of them create more significant source divergence than others, we can make a call which ones alter the original order too much.

As for the way the functions are written, I agree that it's Byzantine. I think it'd be good to settle on the group of floats we're comfortable with, and then I can go back and make the abstraction better.

As for the top-down vs bottom up, I don't have a strong opinion. I actually tried to get rid of RewriteLetAssoc by doing this bottom-down approach but it led to worse codegen. The nice thing about top-down for my purposes with the rewrite branch is that it results in potentially less turns of optimize because deref has the floated lets to work with a bit earlier.

So, to summarize, lemme concoct a good snapshot (or come to the conclusion that said snapshot is unconcoctable) and we'll take it from there!

mikesol commented 1 year ago

@natefaubion I've added four tests now. The tests FloatLetFromArray, FloatLetFromRecord and FloatLetFromCtor all lead to eager evaluation of the + operation on this branch whereas there isn't any on main. In FloatLetFromApp, the result is IMO easier to read & resembles conventional JS syntax more than the one on main.

With those tests in place, I'm interested in your thoughts on the changes in the tests. Do you think they are improvements? If so, do you feel the benefits outweigh the cost of the let-bound terms appearing farther away from their original location?

natefaubion commented 1 year ago

It is hard for me to know for sure with all the additional changes unrelated to let floating. I can understand that let floating (much like inlining) can enable other optimizations, but my goal is to understand what those opportunities actually are so that I can reason about them, rather than nebulously guessing that it might do something.

My understanding in the changes is that you have implemented a number of additional extensions to the recently introduced deref mechanism, and you are proposing that the let floating makes this work much more reliably. Your goal is to remove as many "let barriers" as possible in any path that you might want to deref as these barriers are a dead end for deref-based optimizations since it can't peek through the let. This makes sense in theory.

The problem that I have with this is I don't understand the extensions to this mechanism and what they accomplish, so it's hard for me to say that this specifically is the path that should be taken. In the accepted PR, deref was a simple constant time operation to peek at a known value for known operations, but in this PR it's a greedy recursive rewrite of an arbitrary term. I'm wary that it's quickly becoming a hammer for any nail-looking optimization, subverting other inlining heuristics in the goal of reaching a very specific, localized codegen target. My worries may definitely be unfounded, but it may be that we should be talking about these extensions first if they are a prerequisite.

mikesol commented 1 year ago

My understanding in the changes is that you have implemented a number of additional extensions to the recently introduced deref mechanism, and you are proposing that the let floating makes this work much more reliably. Your goal is to remove as many "let barriers" as possible in any path that you might want to deref as these barriers are a dead end for deref-based optimizations since it can't peek through the let.

Just to confirm that yes, this is exactly what's going on & the strategy I took to make recursive function inlining work in the larger branch.

It may be that we should be talking about these extensions first if they are a prerequisite.

I agree: we should consider if this is the best way to achieve that goal.

One thing I just added from my monster-branch is adding a deref call to the creation of all the NeutLocals in quote. As most of the deref-ing is happening on locals or something with locals inside of it, this speeds stuff up.

The problem that I have with this is I don't understand the extensions to this mechanism and what they accomplish, so it's hard for me to say that this specifically is the path that should be taken.

I'll start from the goal and work my way backwards to the implementation:

In my recursion branch, the decision to inline or not is accomplished by very simple guard that basically checks if there's a branch condition.
In order for a branch to be eliminated, a prim op often needs to be evaluated to true or false.
In order to determine the arguments to the prim op, for non-trivial cases, we need to deref a local.
Often times, a local refers to an accessor, in which case we need to dig inside the accessor to get the value we want. This is the purpose of the new code in deref.
Sometimes, a local has a usable value under several let bindings. To cut through these in the semantic layer, we need to float the let bindings so that we don't hit any closures. This is the purpose of the floatLet family of functions.

So, in summary, deref is used to dig into a local and get a value for a prim op. All of the extensions are used to dig further, ie through a chain of locals or through accessors. The let-floating is used to make sure we don't hit a let when we're deref-fing.

Let me know if this helps and if you have any additional questions or need any additional clarifications. I like the way you phrase it when you say: "we should be talking about these extensions first if they are a prerequisite" - I'm not sure if they are. The extensions, namely the increased amount of deref-ing, the more elaborate deref, and the let floating, are together sufficient to accomplish what I'm trying to do (the code in my larger branch works & is now fast after I've optimized it a bit), but I don't know if they're the best way to do it.

mikesol commented 1 year ago

@natefaubion I implemented the SemRef mechanism we discussed yesterday. Works fine! Let me know what you think a good next step would be. Two options I can think of:

I can remove the let floating and try to write snapshots focus just on deref in this PR, doing let floating in a separate PR once this is merged. I'm not sure, though, if these changes alone can be picked up without let floating. I'd have to experiment.
I can leave the let floating & snapshots in and work on making the floating functions more palatable, as they're still a hot mess of abstraction and indirection.

mikesol commented 1 year ago

I pushed some code to make the let-floating less complex to follow. It uses a more standard pattern of building up a free monad via floatMe and then interpreting the monad with runFloatLets or runFloatLetsWithEnv. That way, the eval function is easier to follow, ie:

      Lit (LitArray arr) ->
        runFloatLets atTop env $ NeutLit <$> (LitArray <$> traverse (eval env >>> floatMe) arr)

whereas current main is something like:

      Lit (LitArray arr) ->
        NeutLit (LitArray (eval env <$> arr))

natefaubion commented 1 year ago

I would prefer to review these things separately, as they are orthogonal, and I still have significant questions regarding let floating and what we should do there. My recommendation would to make a separate PR just for derefing, and where we can figure out tests in the absence of let floating. I would prefer to not rely on let floating tests to demonstrate derefing.

aristanetworks / purescript-backend-optimizer

Floats let bindings outside of literal arrays, literal records and saturated constructors #69