bytecodealliance / wasmtime

A fast and secure runtime for WebAssembly
https://wasmtime.dev/
Apache License 2.0
14.82k stars 1.23k forks source link

ISLE: Support commutative matches #6128

Open jameysharp opened 1 year ago

jameysharp commented 1 year ago

Feature

An ISLE pattern like (iadd (pattern_a) (pattern_b)) should also match if (iadd (pattern_b) (pattern_a)) would match, because iadd is a commutative operation.

Benefit

We currently have many rules, both in mid-end optimization and in backend lowering, where we have to write out all possible rules that are equivalent up to commutativity. If there are N commutative patterns in a rule's left-hand side, that requires writing up to 2^N rules. This is tedious and error-prone.

It's worthwhile having ISLE generate code for each distinct form of these rules because then the ISLE compiler can build an efficient pattern-matching tree that evaluates each part of the pattern as few times as possible.

In the mid-end optimization rules, we've considered rewriting commutative instructions to their flipped equivalents, so that all commutative alternatives are available in the same e-class (see #5546). But it's cheaper to have more ISLE rules than to double the number of instructions in e-classes, the GVN map, and the dataflow graph. And this doesn't help backends because the egraph cost function can pick either version, so there's no guarantee the backend will see the operands in the order it wants (see #5709).

If we can declare once that a term like iadd is commutative and have ISLE generate all the equivalent rules, then we can get the performance advantages of an optimized pattern-matching tree while writing a minimum set of rules.

Challenges

These commutative alternative rules often overlap with each other. If the sub-patterns overlap each other, then swapping them will produce a rule which overlaps with the original. The simplest example is that (iadd x y) overlaps with (iadd y x) since x and y are wildcard patterns which only bind variables.

In some cases, swapping the operands produces exactly the same rule. For example, (iadd x x) only matches if both operands are the same SSA value or e-class, and matches exactly the same inputs when flipped. In these cases, generating a duplicate lowering rule will lead to either an error during overlap checking or an assertion failure during code generation, depending on where we expand the rule. Generating a duplicate egraph rule will produce correct results but do more work than necessary.

In other cases, swapping the operands matches the same inputs when flipped but the right-hand side could evaluate to different results depending on which version of the rule is selected. A rule which lowers (iadd x y) to (x64_add x y) would, when flipped, lower to (x64_add y x) instead. The result should be equivalent but is not the same machine instruction. We need to ensure that we choose one of these alternatives deterministically.

Implementation

We've discussed several alternative implementation plans, and I'm probably missing some. None have yet emerged as the "right" answer.

One idea is a new flag when declaring a term. (decl commutative iadd (Value Value) Inst) would mean that when the term is used as an extractor, then implicitly two versions of the enclosing rule are generated, with the subpatterns swapped.

Another approach is to add some kind of support for or-patterns to the ISLE language. For example, if we allow a term to be defined by multiple internal extractors, then we might create a helper term for matching 2-element value arrays in commutative instructions:

(decl value_array_commutative (Value Value) ValueArray2)
(extractor (value_array_commutative x y) (value_array_2 x y))
(extractor (value_array_commutative y x) (value_array_2 x y))

An alternative syntax is to add a special (or ...) pattern combinator, as proposed by @Kmeakin in https://bytecodealliance.zulipchat.com/#narrow/stream/217117-cranelift/topic/Commutative.20rewrites/near/344151148:

(decl bor (Type Value Value) Value)
(extractor
    (bor ty x y)
    (or
        (inst_data ty (InstructionData.Binary (Opcode.Bor) (value_array_2 x y)))
        (inst_data ty (InstructionData.Binary (Opcode.Bor) (value_array_2 y x))))
)

I prefer these versions over the commutative flag because it allows us to be explicit about how to match the alternatives. For example, we could write explicit alternatives for icmp which reverse the condition code if matching the operands flipped (hand-waving that intcc_reverse is currently a constructor, but would need to be an extractor):

(decl icmp (IntCC Value Value) Inst)
(extractor
    (icmp Cond x y)
    (inst_data (InstructionData.IntCompare (Opcode.Icmp) (value_array_2 x y) Cond))
)
(extractor
    (icmp Cond x y)
    (inst_data (InstructionData.IntCompare (Opcode.Icmp) (value_array_2 y x) (intcc_reverse Cond)))
)

We might add priorities to internal extractors to declare how they should interact with overlap checking. But that leaves priorities ambiguous in a pattern like (isub (iadd a b) (iadd c d)): swapping a with b would apparently have the same priority as swapping c with d.

It might also be useful to declare some constructors as commutative: If we've produced two versions of a rule with left-hand patterns swapped due to commutativity, and find that they match the same inputs and produce the same right-hand side up to constructor commutativity, then we don't need the duplicate rule. Currently we can just avoid writing such rules but if all uses of iadd match an or-pattern, then ISLE needs to figure out how to avoid unnecessary work.

As an extension, it would be nice if ISLE could emit Rust match or-patterns to group match arms which bind the same variables in different orders or from different enum variants. I suspect that optimization would never be relevant for the commutative-operator use case though: if we can tell the right-hand side is the same, I think that always means that the left-hand patterns are equivalent and the rules are just duplicates. I have other uses in mind for or-patterns which might benefit from this optimization though.

A bunch of people have expressed interest in this topic: @cfallin @fitzgen @elliottt @afonso360 @alexcrichton

github-actions[bot] commented 1 year ago

Subscribe to Label Action

cc @cfallin, @fitzgen

This issue or pull request has been labeled: "isle" Thus the following users have been cc'd because of the following labels: * cfallin: isle * fitzgen: isle To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file. [Learn more.](https://github.com/bytecodealliance/subscribe-to-label-action)
jameysharp commented 1 year ago

@cfallin, @fitzgen, @elliottt and I discussed this today and clarified some of the design constraints.

We prefer the explicit (or ...) pattern approach.

We don't want to try to hide commutative patterns inside terms generated by cranelift-codegen-meta. Doing that would require ISLE to analyze or-patterns to identify when multiple alternatives are equivalent. Instead, if you match on, for example, (bor x y) then you'll always match x and y in that order. We can easily hand-write terms for when you explicitly want a commutative match, like so:

(extractor (bor_commute x y) (or (bor x y) (bor y x)))

We have open questions about semantics of or-patterns when used outside of multi-terms. So we've decided to initially implement them only for use in the mid-end simplify multi-constructor, and figure out how they should work for backends later.

When a rule author uses an or-pattern, they're declaring that all the alternatives are equivalent to the same right-hand side. So for the purposes of overlap checking, none of the alternatives within the same pattern should be treated as overlapping. However, for backends, the right-hand side may expand differently depending on which alternative we selected, so the match order is observable and maybe should be explicitly specified. For the mid-end, we're going to select all the alternatives anyway, so we don't need to think about order, and multi-terms are already excluded from overlap checking.

I had hoped to preserve or-patterns until ISLE generates Rust source, to allow emitting Rust match or-patterns when possible. However that seems hard so we're going to implement this as an early rewrite, probably in the sema module. Once we have something working maybe we can do better later. Anyway, this will generate code that is no worse than the current state of things, where we're writing a separate rule for each alternative by hand.

Something I forgot to mention: I've also wanted or-patterns to replace external extractors like fits_in_64 but there's an awkward interaction with error checking. At least initially, I think the following won't work:

(extractor (fits_in_32 ty) (and ty (or $I8 $I16 $I32 $F32)))
(extractor (ty_float ty) (and ty (or $F32 $F64)))
... (fits_in_32 (ty_float ty)) ...

That pattern will expand to sub-rules such as (and $I8 $F32), which we'll reject as an unsatisfiable rule. I kind of wanted this to be accepted as long as at least one of the sub-rules is valid (in this case, (and $F32 $F32)). But I think that's going to hide bugs, and also it's hard to report a useful error message for the case where no sub-rules are valid.

Huh. An or-pattern which doesn't bind any variables can't expand to a different right-hand side based on which case matched, so those could be allowed in backends without introducing overlap or worrying about match order. Maybe that's a useful next step after mid-end usage.

jameysharp commented 1 year ago

Implementing or-patterns in ISLE is turning out to be surprisingly complicated but I've figured out some things.

There's no need to explicitly limit where or-patterns may be used (e.g. only in multi-constructors). Instead let's implement them as syntactic sugar for writing out a set of rules which all have the same priority. Later maybe we'll figure out a priority scheme, and until then, if you use them to write a pair of rules which overlap and aren't in a multi-constructor, your pattern will be rejected by the overlap checker. Similarly, the (fits_int_32 (ty_float _)) case will be rejected since the equivalent Cartesian product of rules would be rejected.

Where should this syntactic sugar be expanded?

So my current plan is to expand this syntax in the Pattern::visit method in sema, which is used during construction of the trie_again representation. The only way I can see to do that is to CPS-transform that method. In particular, the map of variable bindings may be different in each expansion, and the expressions and patterns from if-lets and the right-hand side need to be interpreted with every distinct map of variable bindings.

fitzgen commented 1 year ago

Why don’t you like the explicit pass? That seems more reasonable than CPS converting parts of the compiler to me.

jameysharp commented 1 year ago

That is a fair question!

The plain answer is I just don't like walking the pattern and building an exponential number of clones of the pattern tree while searching for the or-pattern nodes in it. It makes me feel gross.

A somewhat more defensible answer is I can avoid duplicating a bunch of the work to build these rules if I do the expansion during the traversal that builds the trie_again representation. With a separate earlier pass, I have to lean harder on the hash-cons map to deduplicate binding sites that are common across sub-rules. Since build.rs is compiled in debug mode, that could potentially have a significant impact.

It's also worth noting that I don't think the CPS-conversion I need is a big deal. It just affects one function, which recursively walks a pattern and calls caller-provided visitor methods, so it's only a few short match arms.

That said, continuation-passing style is kind of a big hammer to pull out in Rust, so I understand your concern!

fitzgen commented 1 year ago

Ah yeah if it is just one function, that isn't a big deal.

scottmcm commented 3 months ago

Hmm, would the (or …) approach also let us handle equivalent icmps?

Like instead of

(rule (ult ty x y) (icmp ty (IntCC.UnsignedLessThan) x y))

would we be able to have this?

(rule (ult ty x y) (or
    (icmp ty (IntCC.UnsignedLessThan) x y)
    (icmp ty (IntCC.UnsignedGreaterThan) y x)
))
jameysharp commented 3 months ago

Your example places the (or …) on the right-hand side of a rule, which wouldn't work, but I think what you meant was this extractor from prelude.isle instead:

(extractor (ult ty x y) (icmp ty (IntCC.UnsignedLessThan) x y))

And yes, I would expect the equivalent extractor pattern to work:

(extractor (ult ty x y) (or
    (icmp ty (IntCC.UnsignedLessThan) x y)
    (icmp ty (IntCC.UnsignedGreaterThan) y x)
)

Which would sure be handy, right?

scottmcm commented 3 months ago

Oops, you're right, I definitely meant the extractor.

And yes, that sounds extremely handy!

For example, right now all the spaceship optimizations look for (a > b) - (a < b), and improving the lt/gt extractors like that would make them immediately start working for things like (b < a) - (a < b) too.

So or-patterns sounds great, as it would help for more than just commutative scenarios.