Core improvements - Githubissues

SquidDev commented 6 years ago

This is a collection of improvements we could make to the core IR or tools which deal with the IR. Some of these may be totally stupid, but some may be worth pursuing.

Join points

[ ] Join points

One of the problems with the current IR, is that any match expression in a non-tail position is just bound to a normal let. This reduces our ability to do optimisations like match-of-match. I propose adding join points (or continuations, or basic blocks) to the IR. Namely,

let a = match ... of
        | Cons (x, xs) -> Cons (f x, xs)
        | Nil -> Nil
match a of
| Cons (x, xs) -> x
| Nil -> 0

gets lowered to:

let cont = join x -> match ... of
                     | Cons (x, xs) -> x
                     | Nil -> 0
in match ... of
   | Cons (x, xs) -> cont (Cons (f x, xs)
   | Nil -> cont Nil

Join points act identically to normal lambdas, but with several restrictions:

They can only be called in the tail position. I'm not sure if this is strictly needed, but may make the Lua backend easier to implement.
Can only be used in the CotJoinApp term: unlike normal lambdas, join points are not physical values and so should not be used as one.

This ensures we always know what join point we are calling, and all join points know where they are called: this should hopefully allow for additional optimisations.

Unboxed tuples

[x] Unboxed tuples

The issue with the current backend is that it generates a fresh closure for every argument. This is fantastic for currying, but less optimal for efficiency of the generated code. Ideally we could compile fun x y z -> ... into function(x, y, z) ... end most of the time.

It may be possible to get the backend to perform this optimisation, but that seems rather flaky (and will not work well with higher-order functions). Instead I propose adding unboxed tuples to the core. This would reduce fun x y z -> ... into fun a -> match a with | (# x, y, z #) -> .... Passes which do this optimisation may chose to only lower a subset of the arguments (say if a function is often partially applied like compose).

It may also be possible to lower the type arguments to higher order functions (such as foldr, which will never curry the function). This'll be much harder to implement though. Something which would be really fancy would be converting functions which operate on records to operate on unboxed tuples instead.

Various optimisations

[x] Improved inlining: The current inliner is a little naive at times: as each application is processed separately, we end up introducing lots of junk terms. Further more, we may end up inlining a non-saturated function call, which may increase the size of the call. Some improvements we could make:
- Tiny helper functions (such as compose, id, etc...) can always be inlined.
- Partial calls whose arguments have some concrete value (literal, lambda, constructor, record) may be inlined (as we may be able to do optimisations on the body). Ideally we could attempt to inline and revert if it is not considered worth it.
- Calls whose return value is used in a match may be inlined (as we may be able to simplify the match). Again we may be able to do this speculatively.
[ ] Lambda lifting and lowering: This can actually apply to any pure term (literals, constructors, etc...) but this has more of a ring to it. Any term which is only used in one match arm, may be "pushed down" into that arm. Similarly, any term which does not depend on the parent argument may be "pushed up" into a higher level.

If possible, we should try to avoid breaking "lambda boundaries". Namely, we shouldn't really convert f x y -> let a = { x = x } in ... into f x -> let a = { x = x } in y -> ... unless we know it won't prevent other optimisations.

I think lambda lifting also allows us to do loop invariant code motion "for free". That being said, there may be times which will require us to generate trampolines for it to be effective. Namely reduce let f x y -> ... f x' y into let f x y = (* lifted code *) let f' x' = ... in f' x.
[x] Common subexpression elimination: This is pretty self explanatory. The hard thing to do here will be determining when we should do it: you don't want to do CSE if the terms are really far apart from each other, as that just wastes locals.

Backend improvements

[ ] Generate loops: Loop detection will be immensely useful for the optimiser, but more importantly we'll need to convert tail-recursive functions to use loops. Like Urn, we'll need to be careful we don't capture variables which are mutated by the loop iterator.
[ ] Tail recursion modulo cons:
- On integer operations: so technically we can rewrite these functions to be tail recursive. However, as Lua 5.1/5.2 operate on floating points we can't guarantee they are associative. I still think it's worth going for it, as who needs correct optimisations in the first place?
- On constructors: this is more tricky, but I think it is possible to do some rewriting. Consider:

let map f xss = match xss with
              | Nil -> Nil
              | Cons (x, xs) -> Cons (f x, map f xs)

this can be compiled to something like:

local function map(f, xss)
  if xss[1] == "Cons" then
      local res = { "Cons", { f(xs[2][1]), nil } }
      while true then
        -- For the rest of the loop, patch up the previous term and continue
        if xss[1] == "Cons" then
          local this = { "Cons", { f(xs[2][1]), nil } }
          last[2] = this
          xss, last = xss[2][2], this
        else
          last[2] = Nil
          break
        end
      end
      return res
  else
    return Nil
  end
end

Obviously this requires some non-trivial code generation (and duplicates a reasonable amount of code), so it's something we're going to have to think about. It would be amazing to have, as it makes the map implementation much nicer, but it doesn't result in the nicest code.

plt-amy commented 6 years ago

Unboxed tuples

Evil! We should do this in a separate IR with first-class support for multiple return/result instead of piling it on Core.

plt-amy commented 6 years ago

However, as Lua 5.1/5.2 operate on floating points we can't guarantee they are associative.

Thankfully our semantics don't depend on Lua.

amuletml / amulet

Core improvements #15

Join points

Unboxed tuples

Various optimisations

Backend improvements