SMLFamily / Successor-ML

A version of the 1997 SML definition with corrections and some proposed Successor ML features added.
190 stars 10 forks source link

Restrictions on disjunctive patterns #44

Open JohnReppy opened 3 years ago

JohnReppy commented 3 years ago

The Definition of Successor ML specifies that the pattern pat1 | pat2 must be irredundant, which is defined that there must exist a value that is matched by pat2 but not pat1. After discussions with Dave MacQueen, I wonder if the restriction should be that pat1 and pat2 are required to be disjoint; i.e., that there is no value that is matched by both patterns.

This restriction has the property that it makes the | pattern constructor commutative, which I think is intuitive.

RobertHarper commented 3 years ago

The usual argument is that each clause means “this, and not any of the previous”, and it is burdensome to manually make them disjoint. It’s fine to permute them when they are disjoint (but I’m not sure what this would benefit), but I don’t think it’s a good idea to insist that they be. For one thing, the usual catch-all “underscore” case must surely be last, and that is often useful.

Best, Bob

(c) Robert Harper All Rights Reserved.

On Sep 7, 2020, at 17:30, John Reppy notifications@github.com wrote:

The Definition of Successor ML specifies that the pattern pat1 | pat2 must be irredundant, which is defined that there must exist a value that is matched by pat2 but not pat1. After discussions with Dave MacQueen, I wonder if the restriction should be that pat1 and pat2 are required to be disjoint; i.e., that there is no value that is matched by both patterns.

This restriction has the property that it makes the | pattern constructor commutative, which I think is intuitive.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/SMLFamily/Successor-ML/issues/44, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALWY5OOWOHG4CBV35CWRO3SEVGINANCNFSM4Q65Q65A.

ratmice commented 3 years ago

@RobertHarper I believe that this issue is about matches like (Foo(x), y) | (x, Baz(y)) => x, not affecting the normal match ::= pat => exp ⟨| match⟩. I also noted a paper referring to some issues with these disjunctive patterns in #42.

Maybe I'm misinterpreting, but I don't think an expression like x | _ => () makes sense since x isn't bound in all the patterns.

Edit: Updated my first case to actually be sensical.

RobertHarper commented 3 years ago

Ah, I misunderstood. Never mind.

(c) Robert Harper All Rights Reserved.

On Sep 7, 2020, at 19:33, matt rice notifications@github.com wrote:

@RobertHarper https://github.com/RobertHarper I believe that this issue is about matches like (x : bool, y) | (x, y: Int) => x', not affecting the normal match ::= pat => exp ⟨| match⟩`. I also noted a paper referring to some issues with these disjunctive patterns in #42 https://github.com/SMLFamily/Successor-ML/issues/42.

Maybe I'm misinterpreting, but I don't think an expression like x | _ => () makes sense since x isn't bound in all the patterns.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SMLFamily/Successor-ML/issues/44#issuecomment-688540652, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALWY5J3FJHHWNCX2GTLX6DSEVUUPANCNFSM4Q65Q65A.

MatthewFluet commented 3 years ago

I'm not sure. The advantage of specifying it asymmetrically is that it naturally corresponds to the naive implementation of just expanding all the or-patterns into sequential matches, where one would only have a warning if pat2 was irredundant with pat1.

ratmice commented 3 years ago

A good argument in favor is that the restriction proposed could be relaxed, However if we take the relaxed behavior as standard we cannot restrict it further.

Given that this would fix that other issue without the addition of a new warning, which it seems only occurs where overlapping disjunctive patterns meet pattern guards, I would be inclined to argue in favor of it.

dmacqueen commented 3 years ago

I’ve spent a lot of time over the last six months or so rewriting the match compiler for SML/NJ, trying to go back to first principles for the design. The rewrite has made me look closely at a number of issues, including the semantics of OR patterns.

The old match compiler treated OR patterns naively. It eliminated OR patterns by splitting rules whose pattern contained OR patterns into multiple rules. This approach caused a rule like p => e, where p contains a sub pattern of the form (p1 | p2) to split into at least two rules p’ => e and p’’ => e where p’ contains just p1 and p’’ contains just p2 (unless there is some implicit permuting of the rules going on). So with this approach it turns out that p1 seems to have priority over p2 — ignoring other things that might be going on in the pattern context, if p1 matches then the p’ => e rule fires and p2 doesn’t get matched, and conversely if p1 fails to match then p2 (as part of p’’ => e) gets a chance.

But this approach of eliminating embedded OR patterns and splitting rules was a kind of quick and dirty solution. A more fundamental analysis can be made that does not “eliminate” OR patterns but treats them as a “first-class” feature of pattern matching. My new match compiler does this analysis and as a consequence there is no intrinsic reason to treat (p1 | p2) asymmetrically. (Just as there is no reason to treat the pattern (p1, p2) asymmetrically.) [In a decision-tree based match compiler, there are heuristics that determine the sequence in which pattern points are matched, but that is an implementation issue.]

So I wanted to find some “semantic” principles to define what a “proper” or “legal” OR pattern is.

Here are the basic points of my proposal: In an OR pattern (p1 | p2)

(1) p1 and p2 have the same type

(2) p1 and p2 contain the same pattern variables

(3) p1 and p2 are disjoint (meaning Val(p1) \intersect Val(p2) = \emptyset, where Val(p) is the set of values matching p.)

These rules eliminate certain “bugus” OR patterns like, e.g.

x | x

x | true,   true | x

x | 1::x,    1::x | x

These all violate the disjointness principle (3).

I had first thought of a set of reduction rules that would “normalize” OR patterns, such as

x | p —> x p | x —> x (p1,p2) | (p3, p4) —> ((p1 | p3), (p2 | p4))

but after further reflection, it seems simpler and better to just report the bogus OR patterns as errors.

I am writing up a detailed description of the new match compiler and its “theory”, which I hope to complete in the not too distant future after I finish debugging the implementation (intended for SML/NJ 110.99).

Dave

On Sep 8, 2020, at 8:50 AM, matt rice notifications@github.com wrote:

A good argument in favor is that the restriction proposed could be relaxed, However if we take the relaxed behavior as standard we cannot restrict it further.

Given that this would fix that other issue without the addition of a new warning, which it seems only occurs where overlapping disjunctive patterns meet pattern guards, I would be inclined to argue in favor of it.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/SMLFamily/Successor-ML/issues/44#issuecomment-688969142, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXNPJD5E2N4625UXAKH3DSEZHEJANCNFSM4Q65Q65A.

David MacQueen dmacqueen@mac.com

RobertHarper commented 3 years ago

That sounds great, send me the paper when it’s done.

(c) Robert Harper. All Rights Reserved.

On Sep 8, 2020, at 18:41, David MacQueen notifications@github.com wrote:

 I’ve spent a lot of time over the last six months or so rewriting the match compiler for SML/NJ, trying to go back to first principles for the design. The rewrite has made me look closely at a number of issues, including the semantics of OR patterns.

The old match compiler treated OR patterns naively. It eliminated OR patterns by splitting rules whose pattern contained OR patterns into multiple rules. This approach caused a rule like p => e, where p contains a sub pattern of the form (p1 | p2) to split into at least two rules p’ => e and p’’ => e where p’ contains just p1 and p’’ contains just p2 (unless there is some implicit permuting of the rules going on). So with this approach it turns out that p1 seems to have priority over p2 — ignoring other things that might be going on in the pattern context, if p1 matches then the p’ => e rule fires and p2 doesn’t get matched, and conversely if p1 fails to match then p2 (as part of p’’ => e) gets a chance.

But this approach of eliminating embedded OR patterns and splitting rules was a kind of quick and dirty solution. A more fundamental analysis can be made that does not “eliminate” OR patterns but treats them as a “first-class” feature of pattern matching. My new match compiler does this analysis and as a consequence there is no intrinsic reason to treat (p1 | p2) asymmetrically. (Just as there is no reason to treat the pattern (p1, p2) asymmetrically.) [In a decision-tree based match compiler, there are heuristics that determine the sequence in which pattern points are matched, but that is an implementation issue.]

So I wanted to find some “semantic” principles to define what a “proper” or “legal” OR pattern is.

Here are the basic points of my proposal: In an OR pattern (p1 | p2)

(1) p1 and p2 have the same type

(2) p1 and p2 contain the same pattern variables

(3) p1 and p2 are disjoint (meaning Val(p1) \intersect Val(p2) = \emptyset, where Val(p) is the set of values matching p.)

These rules eliminate certain “bugus” OR patterns like, e.g.

x | x

x | true, true | x

x | 1::x, 1::x | x

These all violate the disjointness principle (3).

I had first thought of a set of reduction rules that would “normalize” OR patterns, such as

x | p —> x p | x —> x (p1,p2) | (p3, p4) —> ((p1 | p3), (p2 | p4))

but after further reflection, it seems simpler and better to just report the bogus OR patterns as errors.

I am writing up a detailed description of the new match compiler and its “theory”, which I hope to complete in the not too distant future after I finish debugging the implementation (intended for SML/NJ 110.99).

Dave

On Sep 8, 2020, at 8:50 AM, matt rice notifications@github.com wrote:

A good argument in favor is that the restriction proposed could be relaxed, However if we take the relaxed behavior as standard we cannot restrict it further.

Given that this would fix that other issue without the addition of a new warning, which it seems only occurs where overlapping disjunctive patterns meet pattern guards, I would be inclined to argue in favor of it.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/SMLFamily/Successor-ML/issues/44#issuecomment-688969142, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXNPJD5E2N4625UXAKH3DSEZHEJANCNFSM4Q65Q65A.

David MacQueen dmacqueen@mac.com

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

rossberg commented 3 years ago

Wouldn't such a restriction harm usability? You want overlapping or patterns for similar reasons that you want overlapping cases. Imagine a large datatype like

datatype t = C1 of int | C2 of t2 | ... | CN of tN

and a pattern like the following:

SOME ((C1 n, _) | (_, n)) => e(n)

With the proposed restriction, I'd have to write this as

SOME ((C1 n, _) | ((C2 _ | C3 _ | ... | CN _), n)) => e(n)

which seems rather silly and tedious, especially for larger N.

RobertHarper commented 3 years ago

Ah, good point, which is similar to the one I raised in my misdirected response.

(c) Robert Harper All Rights Reserved.

On Sep 9, 2020, at 08:25, Andreas Rossberg notifications@github.com wrote:

Wouldn't such a restriction harm usability? You want overlapping or patterns for similar reasons that you want overlapping cases. Imagine a large datatype like

datatype t = C1 of int | C2 of t2 | ... | CN of tN and a pattern like the following:

SOME ((C1 n, ) | (, n)) => e(n) With the proposed restriction, I'd have to write this as

SOME ((C1 n, ) | ((C2 | C3 | ... | CN ), n)) => e(n) which seems rather silly, especially for larger N.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SMLFamily/Successor-ML/issues/44#issuecomment-689528266, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALWY5LCBOF5AIR7DDBOP5TSE5X4VANCNFSM4Q65Q65A.

dmacqueen commented 3 years ago

In your example, how the value (C1 2, 3) would be matched would be ambiguous — n might be bound to 2 or to 3.

If you say that this ambiguity should be resolved by choosing the left-most match, then you are imposing an ordering on the match compiler that limits its ability to optimize the order in which pattern nodes are examined.

So there is a trade-off. My strong preference is not to impose a pre-determined ordering on the way that pattern nodes are examined during matches.

Another approach might be to do an analysis leading to a disambiguating transformation of the OR pattern. In this case, your first example might be transformed into your second example automatically.

Dave

On Sep 9, 2020, at 5:25 AM, Andreas Rossberg notifications@github.com wrote:

Wouldn't such a restriction harm usability? You want overlapping or patterns for similar reasons that you want overlapping cases. Imagine a large datatype like

datatype t = C1 of int | C2 of t2 | ... | CN of tN and a pattern like the following:

SOME ((C1 n, ) | (, n)) => e(n) With the proposed restriction, I'd have to write this as

SOME ((C1 n, ) | ((C2 | C3 | ... | CN ), n)) => e(n) which seems rather silly, especially for larger N.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SMLFamily/Successor-ML/issues/44#issuecomment-689528266, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXNPLMLMVEEYCKKYYANNTSE5X4VANCNFSM4Q65Q65A.

David MacQueen dmacqueen@mac.com

MatthewFluet commented 3 years ago

Well, the dynamic semantics of OR-patterns at https://github.com/SMLFamily/Successor-ML/blob/master/definition/dyncor.tex#L1016 pretty clearly specifies that the matches are tried in left-to-right order. (Of course, a static semantics that rejected non-commutative OR-patterns would make the specifics of the dynamic semantics less relevant.)

And, there is clearly an ordering imposed on the match compiler based on the order of the matches.

rossberg commented 3 years ago

Even if the static semantics rejected overlapping or-patterns, their dynamic semantics would still need to be ordered, due to the presence of nested matches and guards, which (unfortunately) can have observable side effects. So they still wouldn't be commutative.

rossberg commented 3 years ago

Another consideration is that such a restriction would complicate the static semantics considerably. So far, the Definition never needs to define coverage for patterns, since it's only needed to implement an informal warning. If we were to elevate that to an error in some cases, we'd have to go through the trouble of formalising coverage.

Moreover, we'd run into the known issue with exception constructors, where overlap cannot generally be decided during type checking of a pattern itself:

functor F (exception E1; exception E2) =
  struct
    fun f x = case x of (E1 | E2) => true | _ => false
  end

structure A1 = F (exception E1; exception E2)       (*) okay
structure A2 = F (exception E1; exception E2 = E1)  (*) not okay?

In other words, such a requirement isn't compositional.

ratmice commented 3 years ago

I would at least like prefer if come up with a better name than disjunctive patterns at least, since disjunction is commutative and this is not, something like ordered intersection patterns (but preferably less terrible :shrug:)

MatthewFluet commented 3 years ago

"left-biased OR-patterns"?

rossberg commented 3 years ago

FWIW, SML's regular disjunction operator (orelse) is not commutative either, with its non-strict left-to-right evaluation.