bytecodealliance / wasmtime

A fast and secure runtime for WebAssembly
https://wasmtime.dev/
Apache License 2.0
15.23k stars 1.28k forks source link

ISLE rule matching order #4717

Closed elliottt closed 4 months ago

elliottt commented 2 years ago

Feature

While working on finishing the migration of the x64 backend to ISLE, I encountered a case where guarding a rule on specific ISA features being enabled caused the rule to never fire. The problem was that the rule I added overlapped with another rule except in the ISA feature check, and the other rule took precedence. Here's an example from elsewhere in the codebase:
https://github.com/bytecodealliance/wasmtime/blob/0f944937c0c3d90946eaa6199f0baa5ed991c80d/cranelift/codegen/src/isa/x64/lower.isle#L1799-L1809

The LHS of the two rules only differs in the patterns to the first argument of (has_type): both rules match (ty_32_or_64 ty) while the rule on line 1799 also matches (use_bmi1). The rule on line 1799 has been given a priority of 1 to ensure that it's checked before the rule on line 1806, but ideally we wouldn't need to give this annotation here.

My proposal is that we switch to compiling ISLE by matching rules top-down, left-to-right.

Benefit

The benefit of this change would be that we could tell statically when a rule was not reachable, and raise an error when it would never fire.

Implementation

For matching on enum values we could take the approach outlined in https://www.cs.tufts.edu/~nr/cs257/archive/luc-maranget/jun08.pdf. For handling extractors we could take inspiration from GHC's implementation of view patterns.

Alternatives

We could continue to develop heuristics to determine when rules might be unconditionally shadowed, and inform the programmer through an error or warning.

cfallin commented 2 years ago

Thanks @elliottt for bringing this up. I have some thoughts on this inspired by our conversations over the past week, together with our experience with this aspect of ISLE over the past year, that I've written up on the subject below.

ISLE rule-ordering semantics simplification

This issue proposes to simplify the ISLE rule-ordering semantics to "first matching rule wins", as a result of a year of experience with the language as it currently stands (a more complex ordering heuristic) and the issues that this causes.

Overview: Current Design

ISLE currently has semantics for rule ordering that generally follow the "most specific rule wins" principle. The goal is that the user, or a tool, should be able to write rules in any order and the resulting compiler backend should be the same no matter how these rules are ordered.

Inside the ISLE compiler, this is implemented by sorting edges when inserting into the match trie. The sort-order is implied by the PatternInst enum and the order of its arms: lower-ordered ops come first at any given node in the tree.

The ISLE language contains a concession to allow for manual ordering: any rule can be annotated with an explicit numeric priority, and rules are sorted by priority first before being merged into a match trie.

Reasons for Current Design

Intuitive "More-Specific-First" Matching

The first reason given for the current design is that it allows for a natural "general case then more specific cases" ordering of rule cases. One can start a group of rules with the most basic handling, and then get into trickier cases later. This aids reading comprehension.

Compatibility with Rule-Generating Tools

One might build a tool that generates a large body of lowering rules from some algorithm that finds equivalences (as in superoptimization). For such a tool, it is convenient to be able to (i) just dump the list of rules, without sorting first; and (ii) allow the user to include these rules without having to think about how they order with respect to handwritten rules.

Independence of Rules for Verification

When verifying the correctness of ISLE lowering rules against semantics of the input and output language, it is useful if the semantics of the language are as-if a rule could fire when the left-hand side matches at all, regardless of other rules that exist. An "unordered" / "unspecified" semantics is closer to this than "first rule wins".

Issues

Unclear mental model

The primary issue is that the mental model required to be held by the ISLE programmer is unclear. If the ISLE programmer truly does not care which rule applies, then all is well. But in real development of compiler backends, one does very much care: one needs to make sure that some special cases are actually handled (or else they are useless dead code that, worse, may be untested); and sometimes one rule that is shadowed by another is actually incorrect in the shadowed case, so one relies on the ordering heuristic for correctness.

In general, development of ISLE rules requires the programmer to understand how the language operates so that they can control which rule fires. If the ordering rules are complex and subtle, or worse are "unspecified", then we get a body of rules that either has silent shadowing, or brittle reliance on particular ordering heuristics that may change. Even if we attempt to clearly specify the semantics, there are subtleties that arise because of the way that external Rust code can be invoked: we don't know when some conditions are disjoint or not, fallible or infallible, and a set of prioritization rules that relies on "fallible before infallible" and disjointness thus requires, in the limit, a full and complete understanding of our helpers' semantics (via explicit specification or some other method).

One can make an argument that when building a complex artifact, one needs one's tools to be as predictable as possible. Complexity is fine, if it is in service of a well-understood goal (e.g., compiling Rust code to x86 machine code). But unclear semantics are not.

"Experimental nature" of Priority Usage

The clearest extraction of the issue here comes in our standard advice for getting proper lowering output: just write the rules, and if the ordering is wrong, apply some priorities.

Consider what this is implying: it is saying that the DSL's semantics are unknowable, or at least it's not worth knowing ahead of time what will happen; instead, one must experiment and then apply corrective action if the heuristic gets it wrong.

This is problematic because the developer may not always take the time to do so ("I'm just adding a quick lowering case, it will probably be hit") -- and because in cases where the metacompiler does get it right without priorities, an implicit dependence on the current heuristics is baked in at that point and must be frozen (see below).

In my opinion, it is not feasible to write high-reliability software if one's tools are unpredictable. Predictable and repeatable behavior is the cornerstone of writing correct lowerings.

"Most specific first" is not simple or unambiguous

But isn't "most specific first" a predictable design point?

It turns out, empirically, the answer is "no". We have had a number of bugs where the heuristics do not do what we would expect, mostly because fallible and infallible extractors are not properly ordered (see e.g. #4685), and the interaction of explicit priorites with implicit ones is complex (#4093, #4117). We have a pending issue to explicitly define what the ordering even is (#3575) but we haven't been able to do so in a simple way (and it is changing as we fix the above bugs).

The basic issue is that because we are doing more than simple tree-matching -- because we can invoke arbitrary Rust code and we have written helpers that check predicates and conditions in various ways, and because some checks are fallible and some are not -- "specific" is unclear. This is aside from the general issue (noted here) that when one has a tree-shaped rather than list-shaped pattern match, specificity is a partial order. (Two patterns can both subsume a general one by being specific in different dimensions.) One can break that tie arbitrarily, but that just adds more complexity.

In general, humans (other than those who hack on islec) seem not to think in terms of these complex heuristics, or at least seem to sometimes have a different expectation than what the compiler really does.

Empirical evidence of mental-model mismatch

To add some evidence to this claim, see the following code reviews:

Developers working in ISLE seem to have a mental model that rules are evaluated top-down. This is likely for several reasons. First, it is the natural "reading order"; thinking in terms of parallel, reorderable conditions is not natural for most programmers. Second, existing languages with match facilities condition programmers this way, including Rust (as well as ML, Haskell, Scala, etc). Third, it is clear that there must be some order, and if the ordering heuristic is otherwise muddy or badly specified, or sometimes changing (see above), then a programmer will likely fall back on "first rule first".

In general we want to follow the Principle of Least Surprise: we should have a simple, predictable tool that behaves like most people who pick it up think it should.

The task at hand (writing a correct compiler) is hard enough; any complexity in the tool that is not necessary should be discarded. Explicitness is a strong principle that results in tools one can trust and do precise work with.

Subtle bugs in heuristics, and heuristics as load-bearing for correctness

Finally, as seen in several of the above-linked issues and PRs, we have had multiple bugs in the ordering heuristics and have had to make changes to get them right.

If a hard-to-understand heuristic is difficult for an ISLE developer to keep track of, a hard-to-understand heuristic that sometimes changes is even worse.

The "experimental answer" to rule ordering -- try it out, add a priority if needed -- implicitly creates dependencies on the current heuristics. If we do so, and then change the heuristics to fix other cases to behave as we expect they would, we are liable to silently alter how rules apply (action-at-a-distance) and subtly break parts of the compiler. Test coverage can hedge against this, but relying on test coverage to patch over footguns in the language is a game that dynamically-typed languages play; we don't want to do that if we can help it.

This potential for latent correctness bugs is terrifying and is itself a strong reason to migrate away from this status quo, in my opinion.

Proposal

Explicit Order-in-File Semantics (First Rule Wins)

The first part of this proposal is simple: the first rule in an ISLE program to match an input, wins.

Priority annotations can continue to exist; if so, they override file order, so semantics are as-if we stable-sort the rules in the input by priority. (We could also remove them; I'm somewhat ambivalent on this, but it doesn't hurt to keep for now.)

Multiple Files: Language-Level Ordering

ISLE programs (for now, the backends in Cranelift) can span multiple files, and right now these files are provided to islec in an arbitrary order. This proposal places meaning on the order of input, so the order of files matters now too.

It is likely a good idea to thus make file order a language-level concept: maybe a "backend.isle" that includes mod "a.isle";, mod "b.isle";, etc.

Rules for a given top-level term may span multiple files (indeed, they do so in at least the aarch64 backend, with scalar and SIMD lowerings kept separately). This is completely fine: it is likely that the user will group together related cases anyway, so if one needs to reorder (e.g.) the two lowerings for a particular instruction, one can do so locally.

Warning/Error on Rule Shadowing

Making the prioritization explicit raises another possibility: we can make shadowing semantically more significant, e.g. produce a warning or error.

This addresses a primary concern of "what if we get the order wrong" (e.g., write a general case then a specific case): we can make it a warning or error! Then the user immediately sees that they need to reorder the two cases.

This highlights explicitness and transparency as two key aspects of the development workflow: one writes what one means, and the tool enforces that one cannot write a nonsensical thing (a useless/never-fired rule) and points exactly to where the issue is.

The result of this workflow will be a set of rules whose meaning is fixed and invariant to action-at-a-distance compiler heuristic tweaks in the future, and where each rule is definitely applied in some case.

Addressing Requirements: Verification

The semantics of a given rule's left-hand side now become: match if conditions are met and no prior rule matched. In a very simple lowering this produces left-hand sides at the SMT level that are linear in the number of prior rules, but this can likely be quickly trimmed by using simple disjointness tests: e.g., an enum match (different instruction formats/opcodes) at the top level partitions rules into many sets, one per top-level instruction, and only these need to be considered together.

We note that explicit priorities, an already-existing ISLE feature, require similar functionality already; this change just alters how widely it must be considered.

Arguably, such a change will result in an easier time at verification as well: this issue suggests removing a feature that leads to brittle, heuristic-dependent rules that do not stand alone, and hence cannot be verified independently. By pushing the list of rules toward a more explicit form we are less likely to have correctness bugs, so verification should proceed more easily.

Addressing Requirements: Tool-Produced Rules

Finally, we note that if a tool produces a large body of rules based on some offline computation (e.g., superoptimization), it can (i) sort the rules by its own specificity heuristic, and (ii) emit the rules into an ISLE file that is processed before the others. In a sense this actually gives more control: different tools can produce different orderings, according to their own specific heuristics.

jameysharp commented 2 years ago

For another source on compiling pattern-match constructs with these top-down left-to-right semantics, Simon Peyton Jones' 1987 book, "The Implementation of Functional Programming Languages", is available as a free PDF. Chapter 5 covers a reasonably straight-forward implementation.

elliottt commented 2 years ago

I ran into another instance of this today: the sse_cmp_op rule in the x64 lowering has overlapping cases for vectored floating point types. The rules on lines on 1504 and 1505 overlap with the rules on 1506 and 1507 respectively: https://github.com/bytecodealliance/wasmtime/blob/c569e7bea5ab9c2f9a617aa1ba7de3d926d880be/cranelift/codegen/src/isa/x64/inst.isle#L1499-L1507

Any change in the heuristic that put the constant patterns before the fallible extractors would cause a panic during instruction emission, as the vector_all_ones rule uses the result of sse_cmp_op as the op for an XmmRmR instruction, and the cmpps and cmppd instructions require an immediate argument: https://github.com/bytecodealliance/wasmtime/blob/c569e7bea5ab9c2f9a617aa1ba7de3d926d880be/cranelift/codegen/src/isa/x64/inst.isle#L1520-L1527

elliottt commented 2 years ago

It's worth pointing out that for the example in https://github.com/bytecodealliance/wasmtime/issues/4717#issuecomment-1217308075 we would still not be able to raise shadowing errors: the patterns that shadow the constant patterns are made up of fallible external extractors, and we don't have any insight into what will cause them to succeed or fail. This is the same situation that comes up in haskell with view patterns.

However, we would benefit from predictable rule matching order, as the problematic rules on lines 1506 and 1507 would not suddenly fire due to heuristic changes.

jameysharp commented 2 years ago

It's occurred to me that we're trying to solve two different issues which might benefit from being treated separately. We're currently using priorities and heuristics to express validity preconditions as well as preferences between multiple valid rules. In other words, a key part of the optimizer's cost model is implicit in the ISLE rules for the backend. Changing the matching order to top-down doesn't change that.

We could instead require every rule to express all of its preconditions, and have a separate way of expressing the cost model. That might mean that for usability we need new ISLE constructs for concisely expressing preconditions, I dunno. Ideally we'd let the user choose between several cost models when running Cranelift.

In that approach, the order of rules doesn't matter, just like today. Formal verification is especially important then because, like today, if you get the preconditions wrong in some rule, it may happen to work a lot of the time due to other rules getting selected first; and like today, you can't tell if that's happening just by looking at the ISLE rules.

One way to use a cost model is to impose a total ordering on rules. Different cost models impose different orderings so you can build different instruction selectors from the same rules. This can be done while building the compiler so there's no compile-time cost for that flexibility.

But where it really helps to keep ISLE rules independent of cost model is if you start using e-graphs in the backend. The general idea is to take the full e-graph from mid-end optimization, apply all valid lowering rules at all e-nodes, and combine the results in a new e-graph. Now you can use the cost model in a way that takes dynamic context into account when there are partially overlapping rules, during extraction instead of during lowering. I think this subsumes peephole optimization.

I've hand-waved away a lot of important details. The main thing is that I think it's worth thinking about other ways to let the consumer of ISLE rules decide what priority to give each rule, or even to find all matching rules. At the same time, backend developers do need to be able to control those priorities based on their knowledge of the ISA. I'm just no longer convinced that control should be expressed in the ISLE rules.

cfallin commented 2 years ago

Thanks @jameysharp for your thoughts here!

We could instead require every rule to express all of its preconditions, and have a separate way of expressing the cost model. That might mean that for usability we need new ISLE constructs for concisely expressing preconditions, I dunno. Ideally we'd let the user choose between several cost models when running Cranelift.

Unfortunately, this proposal is basically the status quo today already. A correct rule should express all conditions necessary for it to fire. And the priority mechanism is a way for the user to provide a cost model. It's possible I'm not understanding some subtlety of your proposal, but to me it reduces more or less to: keep the current design, but do it without bugs (or add some unspecified other mechanism to help do it without bugs, but then... let's talk about that other mechanism, if we have one).

The main body of my argument is that, in practice, this isn't working. The difficulty is not coming up with a working abstraction -- rules that explicitly encode all conditions needed for them to fire -- but the way that that requirement plays out in practice as compared to developers' working mental models. See the "Empirical evidence of mental-model mismatch" section in my comment above.

Concretely: how do we ensure that this "all preconditions included" is the case? I already very carefully review for this, and (i) I have still missed some cases, leading to the bugs linked above, and (ii) we shouldn't have to rely on "think really hard and never make a mistake" to uphold such a property. Formal verification is one answer, but this too feels wrong: we shouldn't stick with an empirically difficult-to-use abstraction early in the pipeline just because we can catch it with a very heavyweight technique later. Bugs cheaper to catch the earlier one catches them and all that.

But where it really helps to keep ISLE rules independent of cost model is if you start using e-graphs in the backend. The general idea is to take the full e-graph from mid-end optimization, apply all valid lowering rules at all e-nodes, and combine the results in a new e-graph. Now you can use the cost model in a way that takes dynamic context into account when there are partially overlapping rules, during extraction instead of during lowering. I think this subsumes peephole optimization.

This I think deserves a largely separate discussion, and I do agree that it's interesting to think about a "late extraction" approach, but really quickly, I think that the proper way to approach this while designing for correctness would be to provide an explicit "multiple options" ("unordered rules") mechanism that works within a broader context of nailed-down, simple-to-understand total ordering, not the other way around. This is because I want the programmer to have to opt into considering all options and testing them.

That I think leads to another way of expressing my broad concerns here, related to the "experimental approach" concerns above: if the default is any-rule-could-fire with some unknowable (and explicitly reconfigurable, under your proposal!) heuristic driving it, then the user has to consider how all possible rule firing orders could result in correct or incorrect lowerings. Human brains mostly aren't built (or mine isn't anyway) for considering combinatorial possibilities at every step. (This is why unstructured pthreads programming is so much harder than structured concurrency; ordered by default, explicitly unordered where specified, is far easier than the opposite.) The analogy to structured concurrency tells us how to proceed here as well: we want well-specified combinators that say "I really do mean that these three different rules are all equally valid and I have thought about the implications", rather than "I happened upon a bug experimentally in the everything-parallel world and want to serialize just this bit as a likely fix".

Does that make some sense at least? Happy to talk further in the Cranelift meeting tomorrow; I really am curious how others feel about this, especially folks with more scars from real ISLE programming and debugging :-)

bjorn3 commented 2 years ago

Concretely: how do we ensure that this "all preconditions included" is the case?

Maybe by having a fuzzer which changes rule priority and disables rules at random and then doing differential fuzzing of the program execution with the rule set used for regular compilation where failure to codegen won't be considered a crash?

cfallin commented 2 years ago

Maybe by having a fuzzer which changes rule priority and disables rules at random and then doing differential fuzzing of the program execution with the rule set used for regular compilation where failure to codegen won't be considered a crash?

That's certainly an option (actually it's even an item in the 2022 Cranelift roadmap, IIRC); but I think that it sort of sits in a similar place to verification in this regard: it would catch issues but I think at a higher cost than simplifying semantics and making it easier to avoid bugs in the first place.

In other words may main concern at the moment is that we have a tool whose behavior is hard to understand and predict, and whose behavior seems to differ from many peoples' intuitions, and hence the tool is hard to use; if we fix that then we can still do all sorts of fuzzing and verification but the need to lean on it will be less, and it can spend its time finding other interesting bugs.

cfallin commented 2 years ago

We had a good discussion on this issue today in the Cranelift meeting; thanks all for the input!

I unfortunately didn't take detailed comment-by-comment notes but by the end of the discussion we had emerging a consensus that the overlap checking is actually the most interesting part; and if we got that right, the ordering of the rules themselves in textual order is actually not as important. There is still the question of whether we have "locally unordered" groups of rules, allowing for the flexibility of some sort of cost model, or whether we have a total order, but overlap checking and explicit opt-in to overlap lets us solve the high-order correctness+explicitness issue.

Basically the idea is: by default it is an error for one rule to shadow another. Shadowing should be clearly semantically defined at the language level, not just "engine can't prove disjoint": for example, we could say that any two rules are disjoint if they match a different enum arm or different constant against the same value (same expression from root arguments). Conversely we should probably say that external extractors have no disjointness at the language-semantics level, and allow for negation at the language level ("this extractor did not match") rather than building ad-hoc pairs of "A" and "not A" extractors.

Then we make the existing rules we have "disjoint-clean" by adding in "not the earlier rule R, and ..." prefixes to their left-hand sides. This becomes easier if we can name rules explicitly (thanks @avanhatt for this idea).

Then we can allow explicit "overlap with unconstrained priority" if/when we want: rather than "not R, and ..." as a left hand side, we can say "either R or ..."; or perhaps group R1 and R2 (with overlapping left-hand sides) in one syntactic "unordered group" (but there are reasons why this may not be practical when generated code is involved); or perhaps a separate out-of-band declaration that R1 and R2 are overlapping.

I think that captures most of what we settled on, but I am probably forgetting some details; my apologies if so; please do add anything I've missed.

github-actions[bot] commented 2 years ago

Subscribe to Label Action

cc @cfallin, @fitzgen

This issue or pull request has been labeled: "isle" Thus the following users have been cc'd because of the following labels: * cfallin: isle * fitzgen: isle To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file. [Learn more.](https://github.com/bytecodealliance/subscribe-to-label-action)
jameysharp commented 4 months ago

I think this issue has largely been resolved by the overlap checker and trie-again work. Explicit rule priorities are obnoxious sometimes but it seems to be pretty clear to people how they work.