eirproject / eir

Erlang ecosystem common IR
Apache License 2.0
250 stars 8 forks source link

Discussion: Operations and Intrinsics #21

Open bitwalker opened 4 years ago

bitwalker commented 4 years ago

I decided to open up our discussion here @hansihe, since Slack is a bit awkward for long form writing.

Background

Hans and I had started talking about what kinds of operations (aka instructions), and/or intrinsics, EIR should have, and whether new ones are necessary/desirable. In particular, I'd like EIR to be expressive enough so that backends can generate efficient code from it; and more generally, ensure EIR is granular enough so that passes have plenty to work with for optimizations.

I'm in favor of adding some additional operations, and possibly defining some BIF-like functionality as intrinsics. Hans disagrees.

Operations vs Intrinsics

One thing I want to clarify is how, and why, I'm distinguishing between intrinsics and operations (i.e. instructions) in the IR.

Intrinsics

Intrinsics are:

The opaque nature of intrinsics is also their weakness, as they cannot take advantage of any optimizations until they have been lowered, which may occur quite late in the pipeline. The result in such cases, is either subpar code generation, or requiring duplicate runs of early passes.

Generally speaking, intrinsics are best used as hints, or to invoke target-specific behavior. I think in this regard, the approach to intrinsics in EIR is on point, as they are entirely opaque.

Examples

A few examples from LLVM of things which are implemented as intrinsics rather than operations.

Operations

Conversely, operations:

All tiers of the compiler operating on the IR must respect the semantics of each operation. In many cases, this happens explicitly, by handling each operation individually. In others, it happens implicitly. For example, writing a compiler pass for an optimization may only directly operate on a subset of operations, but must still preserve the semantics of all operations. This is the weakness of operations - they incur additional maintenance burden and cognitive load.

However, because of their nature, operations are what give an IR its expressive power. The granularity of IR operations directly impacts the kind of optimizations that can be performed - after all, how do you optimize a more general operation to one of higher specificity, if you have nothing but general operations?

So generally speaking, operations are best used as building blocks of varying generality, for things which have a direct impact on the program semantics/behavior.

Examples

Again, from LLVM, some examples of operations (instructions) which illustrate the difference with intrinsics.

These operations range from more general to more primitive. LLVM IR overall is quite low-level, so many of these operations wouldn't make sense to have in something at the level of EIR, but that's not my point anyway, my reference to LLVM IR is just to illustrate how this distinction between operations and intrinsics is handled in a very broadly used IR.

Problem

So I've laid out the difference between operations and intrinsics, but I haven't really explained why I'm bringing it up to begin with. To illustrate, I think it would help to look in detail at an issue I raised about one area of the IR:

When lowering Match, and dealing with a branch with a MatchKind type Tuple(arity), ListCell, or MapItem, the semantics indicate that one of the following occurs if matched:

The unpacked values are then given as block arguments to the successor block of the branch. However, some or all of the values may not be live in the successor block at all, so we'd like to avoid generating those accesses at all.

Unfortunately, this is where we hit a limitation of the current IR. We can't eliminate the unnecessary accesses in EIR at all, because there are no fine-grained operations for accessing just one element of a tuple, or just the head/tail of a cons cell, etc. Instead, we have to lower EIR to another IR that does have those operations, then write a pass which eliminates the dead operations, and updates the argument list of the successor (as well as any other branches to that block). That is really not desirable - if EIR is to be a foundation on which all kinds of analyses and optimizations are based, it needs to be able to handle this kind of optimization.

To be clear, I'm not saying EIR must support these operations, a backend can handle these things on its own. But without more precision, EIR becomes significantly less useful for analysis and optimization, and it ends up making more sense to lower out of EIR as soon as possible and do all the optimization work in a more extensive IR to avoid uneccessary duplication of effort. We're all better off if we can do more in EIR, and only lower out of it when most optimization has already been performed.

Proposal

I'd like to suggest the following additions to the IR. Note that I'm using a more SSA-like syntax here, but that is just to keep things readable.

Key:

Operations:

From my perspective, the above are pretty essential, and likely don't cover the full range of things that would be helpful to have in the IR, but certainly hit the major pain points. Feel free to bikeshed the naming/etc., but the semantics are really the important part.

Things this enables:

Problems this introduces:

A Note On Dialects

One thing that I think may alleviate the above is leaning in to the concept of "dialects" which kinda/sorta exists in EIR, but not really. MLIR is a fantastic example of how dialects can work in practice, and I would suggest taking a look if you haven't already.

In short though, you start with a high-level dialect, and then lower through progressively more primitive dialects as operations in the high-level dialect get converted to their constituent operations in lower-level dialects. However, multiple dialects can be in play at the same time, so it isn't your typical compiler pipeline that lowers from A -> B -> C, it is more like A -> AB -> ABC -> BC -> C.

To that point, each dialect can define partial or full conversions to another dialect, and in the case of MLIR, it will handle transitive conversions as long as there is a path from dialect A to dialect Z. For example, if I define an EIR dialect, and a conversion to the MLIR Standard dialect, I can convert directly from EIR to the LLVM IR dialect automatically, since the Standard dialect defines a conversion to LLVM IR dialect. Passes can perform partial lowerings of operations from one dialect to another as part of performing an optimization, which matches quite well to some of the patterns I mentioned earlier.

The only catch is that I'm not sure how best to design a dialect system in Rust. MLIR has the flexibility of C++ to work with, while Rust is much more strict (thankfully). That said, I think the most imporant pieces would fit naturally in the trait system, so it may be worth modeling the skeleton of MLIRs hierarchy in Rust to see where it falls apart, rather than discounting it completely.

hansihe commented 4 years ago

I'm in favor of adding some additional operations, and possibly defining some BIF-like functionality as intrinsics. Hans disagrees.

I said that I disagreed to a certain degree, and I think the discussion would have been a bit more productive if you had waited for at least the base of my reasoning.

I am not against having intrinsics in general. True intrinsics are already used for a couple of things in the IR, namely representing the semantics needed for receive blocks.

I mainly took issue with the way you wanted to handle guard functions, and more generally how you wanted to handle some BIFs as intrinsics.

I think it's useful to define a couple of different cases:

Proposal

  • %val = extractelement(%term, %index_or_key), a general aggregate access operation

    • on tuples, it unpacks a specific element at the provided zero-based index
    • on cons cells, it unpacks the head via index 0, and the tail via index 1
    • on maps, it unpacks the value with the given key
    • alternatively, these could each be made into different operations to allow for better analysis
    • %val = OP(%cons) # where OP = car|cdr|head|tail
    • %val = extractvalue(%map, %key)
  • updateelement(%term, %index_or_key, %val), an in-place aggregate update operation

    • again, could also break this up into multiple typed operations
    • this is explicitly to enable optimizations which identify places where in-place mutation of a data structure is safe

Both of these generally sound like a good idea to me, and we will need to have some variant of these eventually.

  • is_type<T>(%term, ^yes, ^no) and is_type<T>(%term1, .., %termN, ^yes, ^no)
    • branches to ^yes if %term (or all terms) are of type T, ^no otherwise
    • can be lowered to a runtime call, bit masking/shifting, or elided in favor of a direct branch if analysis can prove that %term (or any term) is or isn't T statically

This is quite trivially represented by a match with one branch matching on the type, and a fallback wildcard branch. I don't really see the value in having a very specific operation like this when the exact same semantics can be trivially expressed with a strictly more powerful operation.

It could be argued that it is desirable to have a variant of match that only checks a single condition instead of a list of them, but I don't see any reason to restrict it to type dispatch only.

  • br ^block(%args..)
    • we currently unify branches with function calls, but I think it would make sense to define a branch operation and lower control-flow calls to explicit branches once we've resolved what is and isn't a "real" call. It will make for easier writing of analysis and optimization passes to boot

This is not really true. The call operation has two variants:

These semantics are preserved in all stages of the IR.

Dialects

I agree that having some way to represent this would be really beneficial, and I have a couple of ideas I would like to try out.