bytecodealliance / wasmtime

A fast and secure runtime for WebAssembly
https://wasmtime.dev/
Apache License 2.0
14.82k stars 1.23k forks source link

ISLE should have `match` expressions #5771

Open fitzgen opened 1 year ago

fitzgen commented 1 year ago

If I want to match on something in an RHS, I have to split the continuation out into a new rule:

(type A extern (enum))
(type B extern (enum I J K))
(type C extern (enum))

(decl a_to_b (A) B)
(extern constructor a_to_b my_a_to_b_impl)

(decl foo (A) C)
(rule (foo a)
  (foo2 (a_to_b a)))

(decl foo2 (B) C)
(rule (foo2 (B.I)) ...)
(rule (foo2 (B.J)) ...)
(rule (foo2 (B.K)) ...)

I wish that I didn't need to define a new term, and could instead match on (a_to_b a) directly inside foo:

(rule (foo a)
  (match (a_to_b a)
    ((B.I) ...)
    ((B.J) ...)
    ((B.K) ...)))

Some things to think about: partiality and side effects.

But since we can already write this today, just with two terms instead of one, I don't think we should have any show stoppers here.

cfallin commented 1 year ago

I think this is an interesting proposal, and could be nice to have!

One observation that came up when we discussed this (and fleshes out your "partiality and side-effects" note) was that it's sort of a dual to if-let: if-let is an embedded sub-match that happens on the left-hand side, and so must have no side-effects (we constrain its sub-RHS to pure ctors only) but can fail a match, because we're prior to the backtracking commit-point. On the other hand, this match happens on the right-hand side, so can have side-effects (can invoke any ctor) but cannot cause a match failure; or can, but only if this is a rule for a partial ctor, because we've already committed.

An alternative that occurs to me now is that we could instead make the separation between pure LHS and impure RHS phases more flexible: one could continue to use pure ctors, and matches, in the RHS up until the point that one uses the first impure ctor. Only at that point would one no longer be allowed to use matches. It does make the backtracking harder to predict though (we can now backtrack from the RHS? in a sense we're defining the RHS semantically rather than syntactically now). Needs more thought...

fitzgen commented 1 year ago

Yep, all very good points.

But also I don't think we absolutely need to answer all those questions yet, since this is just syntactic sugar for something you can already write today.

cfallin commented 1 year ago

Ah, that's interesting; so the proposal is to actually desugar to a separate internally-generated helper term? That could work; it's potentially a little tricky both for plumbing reasons (sema-lowering an expression AST can now create new terms) and because one has to compute the closure-capture when defining the signature of that helper term (all used variable bindings need to be passed in). At that point it doesn't seem too much worse to natively represent a sema::Match and then handle it in codegen (lowering as we do if-lets)...

jameysharp commented 1 year ago

It's not exactly syntactic sugar unless you're proposing that match expressions make synthetic terms... (Oh, Chris beat me to this point.) I would want to take this new syntax as an opportunity to generate better code, anyway: the more we can fold into a single match tree, the more opportunities for sharing computations we'll find.

This is a digression from the main issue, but I want to respond to one point: I think it's useful to maintain the current syntactic restriction that back-tracking can only occur due to failures on the LHS.

It turns out we almost never use partial constructors on the RHS of a rule. As a data point, there are currently two instances across all backends:

It's easier to find these in ISLE's new codegen because now this is the only case where the generated code uses the ? operator; searching for ?; does the job.

fitzgen commented 1 year ago

It's not exactly syntactic sugar unless you're proposing that match expressions make synthetic terms...

Yes: when I want to match in an RHS, I am forced to make a synthetic term. I don't want to do that myself, since it is 100% mechanical. The compiler should do it for me.

If, as an optimization, the compiler represents it some other way internally that is great, but irrelevant from a semantics POV.

cfallin commented 1 year ago

It's mechanical but it's not completely straightforward -- computing the captured closure to pass args to the synthetic term takes some semantic analysis, and splitting the match arms into separate rules is also the first time we have a "fork" in control flow. I'm not saying we shouldn't do it, but I guess my position is that it if the desugaring approach involves this significant additional complexity anyway, it might be worth thinking of doing it "the right way" from the start.