Smarter integer pattern matching

SMLFamily / Successor-ML

A version of the 1997 SML definition with corrections and some proposed Successor ML features added.

190 stars 10 forks source link

One of ML's greatest strengths is pattern matching, which allows us to perform sophisticated case analyses without ever branching on a boolean expression. When it comes to case-analyzing inductively defined types, no other general-purpose language is better than ML.

Unfortunately, ML is far less good at case-analyzing integers. Just last week I wrote the following datatype:

datatype 'a tree
  = Pure of 'a
  | Two of int * 'a tree * 'a tree
  | ThreeA of int * 'a tree * 'a tree * 'a tree
  | ThreeB of int * 'a tree * 'a tree * 'a tree
  | Four of int * 'a tree * 'a tree * 'a tree * 'a tree

and the following helper functions:

fun part (m, Two (n, a, b)) = if m mod 2 = 0 andalso n = m + 1 then SOME (a, b) else NONE
  | part _ = NONE

fun many (n, ab, cd) =
  case (part (n, ab), part (n, cd)) of
      (NONE, NONE) => Two (n, ab, cd)
    | (NONE, SOME (c, d)) => ThreeA (n, ab, c, d)
    | (SOME (a, b), NONE) => ThreeB (n, a, b, cd)
    | (SOME (a, b), SOME (c, d)) => Four (n, a, b, c, d)

What I wish I could have written instead is

fun many (2*n, Two (2*n+1, a, b), Two (2*n+1, c, d)) = Four (2*n, a, b, c, d)
  | many (2*n, Two (2*n+1, a, b), c) = ThreeB (2*n, a, b, c)
  | many (2*n, a, Two (2*n+1, b, c)) = ThreeA (2*n, a, b, c)
  | many args = Two args

The patterns 2*n and 2*n+1 already take care of matching two consecutive integers, the greater of which is the odd one. More generally, what I would want to see is

Pattern matching affine combinations of integer variables with constant coefficients. For example,
```
fun sameParity (x+y, x-y) = true
  | sameParity _ = false
```

Patterns that only match nonnegative integers. For example,

fun lessOrEq (n, n + nonneg k) = true
  | lessOrEq _ = false

Of course, the type checker should reject patterns that can be matched in two or more ways for the same input. For example, the snippet
```
fun wrong (2*x + 3*y, 4*x + 6*y) = ...
  | wrong _ = ...
```
is wrong because the matrix [2 3; 4 6] (sorry, MATLAB notation) is not invertible, whereas the snippet
```
fun okay (2*x + 3*y, 3*x + 2*y) = ...
  | okay _ = ...
```
is not wrong because the matrix [2 3; 3 2] is invertible.

Does this seem like a reasonable feature to want?

We all have similar examples, such as considering balance factors in search trees, and many many others. The tension is always between making some particular piece of code look good, vs the cost of increasing amounts of od hackery that eventually collapses on itself. We’ve tended, for better or worse, with SML to go with very widely applicable concepts of pattern matching, and avoiding special cases. A long-standing favorite is fun ModusPonens (A=>B,A) = B, which involves a non-linear pattern, and is almost iresistable, until you start thinking about similar-looking numeric examples. Because int’s are not integers and float’s are not reals, all sorts of odd cases come up as regards overflow, sign conventions, and, worst of all, the delicate nature of floating point arithmetic (eg, avoid subtraction at all costs, the results are pure noise). It quickly becomes much more of a mess than it’s worth. Another common suggestion is to include regular expressions as string patterns, because after all aren’t regular expressions all about pattern matching? Where do we begin trying to explain why this is not remotely sensible.

I realize that pretty much all popular languages gain popularity by making a small suite of specialized examples look good. And then it’s endless pain and suffering as the brittle nature of the tricks becomes all-too-apparent.

Bob

One of ML's greatest strengths is pattern matching, which allows us to perform sophisticated case analyses without ever branching on a boolean expression. When it comes to case-analyzing inductively defined types, no other general-purpose language is better than ML. Unfortunately, ML is far less good at case-analyzing integers. Just last week I wrote the following datatype: datatype 'a tree = Pure of 'a | Two of int 'a tree 'a tree | ThreeA of int 'a tree 'a tree 'a tree | ThreeB of int 'a tree 'a tree 'a tree | Four of int 'a tree 'a tree 'a tree 'a tree and the following helper functions: fun part (m, Two (n, a, b)) = if m mod 2 = 0 andalso n = m + 1 then SOME (a, b) else NONE | part _ = NONE

fun many (n, ab, cd) = case (part (n, ab), part (n, cd)) of (NONE, NONE) => Two (n, ab, cd) | (NONE, SOME (c, d)) => ThreeA (n, ab, c, d) | (SOME (a, b), NONE) => ThreeB (n, a, b, cd) | (SOME (a, b), SOME (c, d)) => Four (n, a, b, c, d) What I wish I could have written instead is fun many (2n, Two (2n+1, a, b), Two (2n+1, c, d)) = Four (2n, a, b, c, d) | many (2n, Two (2n+1, a, b), c) = ThreeB (2n, a, b, c) | many (2n, a, Two (2n+1, b, c)) = ThreeA (2n, a, b, c) | many args = Two args The patterns 2n and 2n+1 already take care of matching two consecutive integers, the greater of which is the odd one. More generally, what I would want to see is

• > Pattern matching affine combinations of integer variables with constant coefficients. For example, fun sameParity (x+y, x-y) = true | sameParity = false • > Patterns that only match nonnegative integers. For example, fun lessOrEq (n, n + nonneg k) = true | lessOrEq = false • > Of course, the type checker should reject patterns that can be matched in two or more ways for the same input. For example, the snippet fun wrong (2x + 3y, 4x + 6y) = ... | wrong = ... is wrong because the matrix [2 3; 4 6] (sorry, MATLAB notation) is not invertible, whereas the snippet fun okay (2x + 3y, 3x + 2y) = ... | okay = ... is not wrong because the matrix [2 3; 3 2] is invertible.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

SMLFamily / Successor-ML

Smarter integer pattern matching #45