bytecodealliance / wasmtime

A fast and secure runtime for WebAssembly
https://wasmtime.dev/
Apache License 2.0
15.12k stars 1.26k forks source link

Cranelift: GVN depends on value definition order matching visit order #6126

Closed jameysharp closed 7 months ago

jameysharp commented 1 year ago

If the input CLIF does not define values in increasing numeric order, then the egraph GVN map may not recognize that two instructions are the same.

.clif Test Case

test optimize precise-output
set opt_level=speed
set use_egraphs=true
target x86_64

; In this example, iconst defines v2, and later an identical iconst defines v1. 
; In between, v2 is used in an iadd instruction that's added to the GVN map.
; Finally, v1 is used in another iadd which should unify with the first one.
function %unordered(i32) -> i32, i32 {
block0(v0: i32):
    v2 = iconst.i32 42
    v3 = iadd v0, v2
    v1 = iconst.i32 42
    v4 = iadd v0, v1
    return v3, v4
}

; function %unordered(i32) -> i32, i32 fast {
; block0(v0: i32):
;     v2 = iconst.i32 42
;     v3 = iadd v0, v2  ; v2 = 42
;     return v3, v3
; }

Steps to Reproduce

Run the above test with cargo run -p cranelift-tools test.

Expected Results

The above test should pass, with both iadd instructions merged into one.

Actual Results

The iadd instructions are not merged:

; function %unordered(i32) -> i32, i32 fast {
; block0(v0: i32):
;     v2 = iconst.i32 42
;     v3 = iadd v0, v2  ; v2 = 42
;     v4 = iadd v0, v2  ; v2 = 42
;     return v3, v4
; }

Versions and Environment

Cranelift version or commit: main (94f2ff092)

Operating system: Linux

Architecture: x86-64

Extra Info

The hash key and equality definition in the GVN map are based on the set representative returned by the union-find data structure. So once we've added an instruction to the GVN map, every value it uses must have a stable set representative.

This union-find implementation always returns the minimum Value from the set as the set's representative. So a value has a stable set representative if it is never unioned with another set having a smaller set representative.

Most of the time we satisfy this condition. Any newly-created instructions from ISLE rules have value numbers greater than all existing instructions, and within a basic block frontends normally allocate value numbers at the same time that they insert instructions.

As this test case demonstrates though, it's easy to produce a different value order by writing CLIF by hand. I think the same bug can also be triggered if the frontend builds basic blocks in a different order than we visit them during equality saturation, which I suspect is not uncommon.

If the only consequence of this bug is that we sometimes fail to merge identical instructions, then it's not a correctness bug, just a missed optimization. I'm concerned though because this means the hash value for a key in the GVN map can change over time, which violates a hash-map invariant, and I'm not certain of the consequences of that. I don't think we'll trigger undefined behavior or panics, but clearly we can fail to find keys which are already in the map, and insert duplicate keys.

I'm not sure how to fix this.

One half-baked idea: when building the union tree of all results from simplify, if exactly one of the results was already in the GVN map, we use its representative as the representative of the new set. I suspect there can be more than one such result though, so I think that just delays the problem. Also it's a rather invasive change to how we build nodes from ISLE rules.

I think when we're about to union two values together we have enough information to notice that one of them will have its set representative changed. But the only thing I can think to do with that information is to find all the other instructions in the GVN map that refer to the changed value, and remove and reinsert them. That sounds awful.

Another option I don't like is to renumber all the values in the function so that they're in increasing order with respect to the order we're actually going to visit the blocks in.

A variant of renumbering is to keep a map of the values we've seen so far, mapping them to sequentially increasing integers, and use those integers as the union-find IDs. I think that might work, but I haven't followed the idea through far enough to be sure.

cc: @cfallin

cfallin commented 1 year ago

Hmm, interesting -- thanks for finding this!

I think that there actually may be a much simpler fix than any of the proposals (none of which feel very good to me -- they all mix invariants across subsystems/data structures somehow, or feel otherwise brittle). The idea is: when we union, always make the right-hand side point to the left-hand side as parent (so representative value is always the leftmost of the unioned values, however the union ops reassociate), and then be careful in our union usage that new values that are added as the right-hand side of the union.

This seems to me to be philosophically most aligned with the rest of the aegraph strategy -- new items always point to old items, never vice-versa.

There may be something I'm missing in the interactions with GVN'ing though...

cfallin commented 1 year ago

Ah, sure enough, I see one issue with my proposal at least: rewrite rule hits in the GVN map and returns an old ID as the target of the rewrite; then when we union that in, we alter its representative value...

jameysharp commented 1 year ago

Yeah, exactly. I think there's no way to guarantee that we're only unioning with a value that isn't used by any other instructions in the GVN map.

cfallin commented 1 year ago

I think I've convinced myself at least that this falls into a "missed optimization" category rather than "broken hashmap invariants" category:

A value thus may become equal to new items over time as unions occur, but its hashcode (as seen by the table!) will never change; so I think the invariants are maintained.

cfallin commented 1 year ago

FWIW, an alternative that we can always fall back to, as well, is to not hash/eq according to representative values but according to the literal (latest) values. With eager rewriting, this may not even miss very many opportunities; it might be worth testing this. It could result in a compile-time speedup too.

jameysharp commented 1 year ago

I'm not sure I understand what you mean by "latest" values. Do you mean whatever is currently in the value_to_opt_value map? I agree that's worth trying. I think it should fix the test case I wrote for this issue, at least.

When you say "eager rewriting", I'm guessing you mean ensuring that, any time we index into the GVN map, we have already updated the instruction's arguments. Since we only use the GVN map inside insert_pure_enode, we can always do that there, like in #6135. But I think whenever we have a new instruction we might have constructed it using only normalized Values, so it might only be necessary when looking up existing instructions in remove_pure_and_optimize.

If we can look up instructions which have already been normalized before we hash them and we don't use union-find to pick a set representative, then I believe we can entirely delete this union-find implementation.

At that point we're only using CtxHashMap to look up the contents of value lists. But I think there might aren't any pure or idempotent instructions which use value lists, so maybe we can switch to a plain HashMap and delete CtxHashMap as well…

cfallin commented 1 year ago

I'm not sure I understand what you mean by "latest" values. Do you mean whatever is currently in the value_to_opt_value map? I agree that's worth trying. I think it should fix the test case I wrote for this issue, at least.

Yep, exactly: latest with respect to the "append-only immutable data structure" view. My mental image of an eclass in the aegraph is roughly that there are two pointers that we hold -- the "latest", after we've done all rewriting, which can reach all members of the eclass via the tree of union nodes; and the "canonical", which (until you found this discrepancy) was meant to be the earliest. This append-only-log view with nominally increasing value numbers is I think also why we didn't see this bug earlier: creating values with out-of-order numbers is unusual, at least, though it's perfectly legal.

At one point I had considered making all args in nodes actually keep this dual view (two pointers, canonical and latest) but quickly abandoned that thought because of the size implications; and that was how the context-hash with normalizing behavior was born...

When you say "eager rewriting", I'm guessing you mean ensuring that, any time we index into the GVN map, we have already updated the instruction's arguments. Since we only use the GVN map inside insert_pure_enode, we can always do that there, like in #6135. But I think whenever we have a new instruction we might have constructed it using only normalized Values, so it might only be necessary when looking up existing instructions in remove_pure_and_optimize.

Not quite; by "eager rewriting" I am referring to our rewriting strategy, namely that we invoke simplify right away when creating nodes, so the "latest" is quickly built and then won't change later. So any later references to the eclass, arrived at via mapping through value_to_opt_value or via hits in the GVN-map, will use the same "latest" value, most likely. In other words, I'm not suggesting that we ensure anything via new logic; I'm hypothesizing that a property of our existing implementation makes a strategy ("just look up based on the values that we have") likely good enough (and that we should test this!).

A subtle but important detail here too is that when we recursively create new nodes via rewrites' RHSes, we optimize those nodes and then insert them into the GVN map (and likewise for toplevel nodes). This means that the GVN-map entry for every value in a chain of rewrites will point to the "latest", because the insertions happen as we return up the stack after all the union-nodes are built.

If we can look up instructions which have already been normalized before we hash them and we don't use union-find to pick a set representative, then I believe we can entirely delete this union-find implementation.

Possibly! I haven't fully thought through this; it may be the case that during elaboration we still want to track things by canonical ID...

At that point we're only using CtxHashMap to look up the contents of value lists. But I think there might aren't any pure or idempotent instructions which use value lists, so maybe we can switch to a plain HashMap and delete CtxHashMap as well…

Yup, that would be great; it's the ugliest part of the implementation. It also would have a nice side-benefit of removing the only unsafe in the impl (to wrap the RawTable usage).

cfallin commented 1 year ago

So I just came up with the (obvious in hindsight) counterexample that I think led me to come up with the existing scheme in the first place:

(Sorry for just paging in / rediscovering the canonical bad case now; it's been a while!)

This case I suspect will actually be fairly common, because it's common for ops to optimize to some inner subexpression (x + 0, x | 0 and the like). So, I retract my hypothesis above!

I think, in addition to your possibilities above, there are a few other possible solutions:

I'm sort of leaning toward accepting the latter and moving on, but I could be convinced of something better too! I think the only door that's really closed is some sort of "rehash on merge" approach, because that's just going to be too expensive.

cfallin commented 1 year ago

Last thought that paged itself back in (I was trying to think about something else, I promise!): the union-find and canonical/representative IDs are necessary during elaboration as well because this is how scoped elaboration answers the question "do I already have this value?". If we give that up, we'll likely have more redundant elaboration when we see examples of the form above as well. So I think I'm reasoning myself back to the current design and becoming slightly more convinced (still open to ideas though!) that the most reasonable answer here is "leave as-is".

meithecatte commented 11 months ago

I would like to propose another bad example:

  1. We process the following instructions, and nothing very interesting happens:
    v2 = foo_op v1
    v3 = bar_op v1
    v4 = expensive_op v3
  2. We encounter v5 = weird_op v1. Due to some special form of v1 just out of view, two optimizations trigger: one rewrites to foo_op v1, the other to bar_op v1. We union v2, v3 and v5 together. Necessarily, the canonical representative of either v2 or v3 will change. Let's assume it's v3, as that's what would happen with today's strategy.
  3. We encounter v6 = expensive_op v3. Even though this would be trivially deduplicated with v4 by classic GVN, we'd miss this, since the canonical representative changed.

Now, I'm not sure how realistic this is, but it definitely makes things more difficult to reason about. I've actually been looking into acyclic e-graphs for a project unrelated to cranelift, and bounced off a few times with "I must be misunderstanding something" when the shape of the code asked "what should happen in this case".

Essentially, we would like eclasses.union to have the precondition of "this won't ever change the canonical id for anything in the GVN map", but there's no way to actually ensure that, short of a global correctness condition on the whole body of rewrite rules. I do agree, however, that this seems to be, at worst, missed-optimisation severity, unless there are situation where we look something up in the GVN map with "we must've inserted it already, therefore the case where it's not present in the GVN map is unreachable".

cfallin commented 11 months ago

@meithecatte indeed, I think that would still be a correct compilation, just not an optimal one.

In a little more detail, I think it's unlikely to happen if one keeps a sort of "transitive equality" principle in mind when writing the rewrite rules: here we have A -> B and A -> C, but there are no rules that rewrite B -> C or C -> B. Ideally we have a full enough set of identities that if A rewrites to both B and C, we already know the latter two are equivalent. Or perhaps we write only one of the rewrites out of A (say, to B), and then trust other rules to optimize that further. I guess there is sort of a stratification in my mind here: A is some complex thing we're decomposing, and B and C are "simple" ops in some lower level of complexity; that lower layer itself should ideally be fairly "rewrite-complete" so we don't have "indirect equalities" like this exposed.

That's a little handwavy, and again the worst that happens is that we miss a GVN, so it's not the end of the world -- part of the tradeoff we accept in not doing the full recanonicalization, which is much more expensive in the common case :-)

meithecatte commented 11 months ago

I have some thoughts on how to fix this. The general idea is that the union-find should support "pinning" a value, such that it then does its best to keep that particular value as the canonical representative. You would then pin the value ids that get put into the gvn_map.

Implementation-wise, this could be done with an augmented union-by-rank strategy – pinning simply sets the rank to u8::MAX. Trying to union two pinned values together should then issue a warning, to alert the compiler engineer that optimization rules are probably missing (and then give up on maintaining one of the pins). Is there an appropriate logging framework in place for this sort of thing?

cfallin commented 11 months ago

That's a really interesting idea; if you're interested in prototyping this, I think we'd at least be very interested in data (does it cause any regressions in compile time?) and confirming it doesn't alter generated code.

Such a warning could probably be emitted with the usual logging framework (log::info!() or log::warn!(), depending on how noisy we want to be -- we've had embedders complain before about verbosity of our logs at higher levels so we may want to be a little careful). Or alternately, we have the stats framework already, and we could count the number of such cases. At some point we want to use those stats to detect issues and drive optimizations/improvements ourselves and having a "should be zero, potential issue if not" stat could be a supported observation mode...