bytecodealliance / wasmtime

A fast and secure runtime for WebAssembly
https://wasmtime.dev/
Apache License 2.0
15.08k stars 1.26k forks source link

cranelift/egraphs: Optimize instructions without results #6106

Open jameysharp opened 1 year ago

jameysharp commented 1 year ago

Feature

We should allow the egraph pass to rewrite instructions which have no result values.

This resembles #5908, in that no-result instructions have side effects and that currently prevents us from optimizing them. However the implementation details are quite different.

Benefit

Our current ISLE simplify term isn't usable for this because it can only rewrite an SSA value into other SSA values, so instructions without any value results have nothing to replace.

The only rewrites which need to be in the egraph pass are those which either pattern-match on SSA values or replace SSA values. But instructions which don't produce value results can still be in both of those categories. For example, if we pattern match the condition operand to brif and find that it's a constant, we can rewrite it to an unconditional jump. Removing the unreachable edge from the CFG can then mean that some block parameter is always equivalent to another SSA value, enabling more simplification rules.

The biggest benefits only come in combination with other optimizations that we haven't written yet. Notably, we need to integrate the "remove constant phis" pass into the egraph pass before branch-folding will change any later optimization results.

Implementation

Which no-result instructions can benefit from rewriting? I claim it's the ones which have no fixed value results, but have at least one fixed value operand. They could have variable operands, variable results, or block-call parameters, but we won't touch any of those.

A rewrite pattern is always based on knowing something about the meaning of a value operand. For variable operands like the function parameters in a call, the return values in a return, or the block parameters in a jump, there isn't any generic pattern we can apply to them because we don't know what they mean. Put another way, we don't need any opcode-specific handling for instructions where all we can do is update their value operands to reflect other rewrites.

Among those instructions, we also need to exclude store instructions, because we need alias analysis to rewrite those correctly.

The current instructions which meet these criteria are:

Conditional branches, either as block terminators or conditional traps, are definitely interesting to rewrite.

Indirect calls are interesting to rewrite if we can prove that the call target is a constant.

I don't think SetPinnedReg has any useful rewrites based on the value operand, because it doesn't have any defined meaning in CLIF. It's just however the frontend wants to use it.

Alternatives

There are some implementation choices we haven't decided on.

I'd like to do this in ISLE rules because we may want to pattern-match arbitrary subtrees of the data-flow graph. However @cfallin has argued for doing this in pure Rust until we have good reason to implement rewrite rules that are complicated enough to justify hooking up ISLE.

We might find we have several alternative instructions to choose from, like we currently do for the simplify term. We could:

jameysharp commented 1 year ago

Follow-on work for this would ideally update the CFG after each control-flow rewrite, including updating the post-order and dom-tree. This isn't necessary initially though.

I believe the post-order can be updated incrementally as long as edges are only removed from the CFG, never added. I think the post-order remains valid without any changes unless blocks become unreachable. If they do, removing all the unreachable blocks from the post-order while preserving the order of the remaining blocks should produce a new valid post-order.

I think that when CFG edges are only removed, a block can only move down the dom-tree. I think only the blocks which were the target of a removed edge can have their immediate dominators change. And I have a gut feeling that as long as the post-order is accurate, recomputing the immediate dominator of each block is a local operation that doesn't require re-doing the fixpoint.

I'm not especially confident about any of the above though.

Meanwhile, here are some examples of valid rewrites which are more complicated than just constant-propagation. I don't know if any of these specific patterns occur much or at all. It's just that since I can think of a bunch pretty quickly I'm inclined to guess that we will find patterns of this level of complexity in real code.

For brif, trapz, or trapnz:

For br_table:

In many of these cases we can remove pure instructions, but only because of the context in which they're used, not because they're inherently equivalent to anything else. So purely value-based equality saturation can't implement these rules.

Many of these br_table rules can produce new instructions which can match other rules. Some, like the isub rule, may not always be improvements but may expose other opportunities which are. In short, these look a lot like the sorts of cases we want equality saturation for in the first place.

Some of these examples could be subsumed by either value range analysis or known-bits analysis, but not all.