Introduce a new pseudo-IR we can lower Eir to that is easier to consume

hansihe commented 4 years ago

Due to being based on Thorin, Eir is a pretty novel IR. This has many advantages in that it makes a lot of optimizations a lot easier to perform.

However, one downside to this is that since it is so different to traditional SSA IRs, lowering from it can be quite a complex and error prone task with loads of work duplicated between backends.

A much better approach here would be to keep and maintain these complex transformations as part of eir itself. This could be done with a relatively small amount of work, by introducing a different IR which represents the eir in a more traditional and approachable format.

This new IR would have:

Statically scheduled PrimOps Any backend which does not have the concept of floating primops (pretty much all of them) would need to do some form of primop scheduling when lowering. Doing this the naive way, constructing the primop on every usage, is simple, but leaves much to be desired. A preferred method would be to do a more sophisticated primop scheduling pass on the eir graph. This scheduled form would be represented in the new IR, and which instance is used at which point would be fully explicit.

The original PrimOp information would also be exposed, so that any backends that wish to handle this themselves could easily do so.
Explicit liveness information Liveness information would be part of the IR.
Explicit lambda captures Separate functions would be represented separately, no need for the builder to juggle several functions within one eir container.
More explicit control flow Returns, throws, calls and branches would each be represented separately and more clearly.

I think this could significantly simplify both lumen codegen and any other potential future backends we may have for eir. When I experimented with a BEAM backend, I really missed having something like this to lower from.

hansihe commented 4 years ago

Thoughts @bitwalker?

bitwalker commented 4 years ago

Before getting in to specifics about the current IR, I want to clarify my take on what is important in a frontend/middle-tier IR:

I want the middle-tier to be focused on a relatively high-level representation that preserves much of the source language semantics. That is, the kind of optimizations being performed here are mostly focused on the bigger picture items. Examples being propagating type information, partially/completely monomorphizing functions/protocols/behaviors, trivial dead-code elimination/constant propagation, identifying loops and performing high-level loop-structured optimizations.
When it comes time to do codegen, different compilers are going to have potentially wildly different approaches. There are those who will want to do something quick and dirty and lower directly to assembly; some people will want to codegen to C/another language/virtual machine opcodes, and others will have a longer pipeline with multiple stages (like Lumen). To this end, you wouldn't want a middle-tier to have multiple IRs that are tightly coupled together all the way down to some relatively low-level IR, as it makes it difficult to insert your backend at a point that best suits the needs of your compiler. In short, if you plan to have multiple IRs in the middle, make sure they are modular enough that someone can come along and plug their own thing into the middle and either skip the other IRs, or lower to them.
Similarly, generating code for different targets can require a completely different approach based on the requirements of the target, and in particular, you want to be able to control to what degree the resulting code is optimized for size or speed. So you don't want to be too aggressive about inlining/outlining, or other optimization decisions that take those decisions away from the code generator; at least not without surfacing relatively precise control over that to the compiler. Likewise, you don't want to commit too much to a specific low-level representation of any given construct if possible, since different targets may have different tools that allow for more efficient ways of representing those constructs, and being able to take advantage of those during codegen is going to depend on having some flexibility in the middle-tier representation, i.e. abstraction.

For Lumen specifically, I'm very much invested in making maximal use of MLIR/LLVM, so it is actually less desirable for me to have anything resembling a low-level IR, and instead convert from a relatively high-level IR into an MLIR dialect, and then do a series of dialect transformations representing the stripping away of progressively higher-levels of abstraction all the way down to LLVM, rather than try to translate directly to LLVM IR. The main thing relevant to this discussion is around liveness and scheduling, but I just wanted to be clear about what I'm looking to EIR to provide.

So putting all of that together, here's my take on what I'd do to improve EIR:

Making returns/throws/calls/branches more explicit is a nice-to-have, but is actually the least problematic thing about the continuations-based representation, as it is pretty clear how to distinguish each case. Of course, since it is easy to distinguish, there is little reason not to make it explicit in the IR passed to codegen.
Finding a more explicit representation for closures would definitely be nice. For my purposes, it is actually ideal to represent a closure as just another operation within the function (i.e. if a function is itself an operation, which contains a region, and regions contain blocks, which contain other operations; then a closure just happens to be a function nested within another function, or put another way, a nested region), and I think I'd actually prefer that. That said outlining them is fine too. An example of a benefit of not outlining them is that you can identify whether dead values in the closure propagate into the caller (i.e. a closure is created at some early stage, capturing the enclosing environment, but then later due to some optimization, some of the captured values are not used, then a dead-code elimination pass that understands closures and can see the original enclosing function, can treat uses in the parent scope as dead also, and eliminate those unnecessary loads/stores). One of the most confusing things with the current EIR is that there is no distinct closure representation, and instead all of the blocks/values from both the caller and the callee are blended together (aside from the scope information, which is how we make the distinction today). I agree that making this more explicit would avoid a lot of potential for mistakes.
Liveness is definitely an area that was an early source of bugs for me, due to some early restrictions in MLIR that disallowed referencing values in predecessor blocks, instead requiring that they be passed as block arguments. That is no longer an issue, and the information isn't particularly useful to me, since I just lower directly to MLIR and use its analyses, and I don't think I would change that even if it was done differently in EIR. In short, it would be useful I think, especially if one is staying entirely in Rust, but it is of little consequence to me if liveness information is surfaced in the middle-tier, since I will end up discarding it/recreating it anyway during lowering to MLIR.
Scheduling: as long as it is possible to operate on the unscheduled form, I think it could certainly be useful for other backends to have a statically scheduled form to codegen from. For my own purposes, lowering with the naive approach and then scheduling them as part of other optimizations is a better fit for MLIR; since it exposes more optimization opportunities during codegen, but it may be that having EIR schedule ops has little impact on those opportunties, and eliminates redundant work. Honestly I'd have to do some side-by-side comparison to see what works better in practice. In any case, I think in general it makes sense to have a cleaned up form of EIR with ops already scheduled, since it is more intuitive that way.

I think its a good idea to provide a variant dialect of EIR to downstream consumers with a more explicit form, but I would try to preserve the flexibility around scheduling in some form, since some compilers may want to do their own scheduling. With Lumen, I'd want to do some experimentation to see what works best in practice for codegen, but as long as some form of the current representation is still available, I don't see any reason why it wouldn't be worthwhile to develop another downstream IR that is lower-level, or at least commits to some scheduling decisions.

eirproject / eir

Introduce a new pseudo-IR we can lower Eir to that is easier to consume #28