This issue proposes the addition of pattern matching to the ghūl programming language. Pattern matching provides a concise and safe way to work with tagged unions and other data types.
Syntax
Pattern matching in ghūl is performed using the case expression:
case expression
when pattern1: result1
when pattern2: result2
...
else default_result
esac
The case keyword is followed by the expression to match on.
Each when clause specifies a pattern to match and the corresponding result expression.
Patterns can match literals, union variants, tuples, and types.
The else clause is optional and specifies the result if no pattern matches.
The esac keyword terminates the case expression.
The body of each when clause is a statement list to be executed if the pattern matches. The last statement must be an expression that will provide a value for the when
Note the use of else rather than default is a source breaking change - existing case statements will need to be converted to case expressions by replacing default with else. As case statements are very rarely used the impact of this will be low, but there are places in the compiler (the tokenizer) that will need to be re-written temporarily to use if / elif / else to allow bootstrapping across this breaking change.
ghūl will support the following types of patterns:
Literal patterns: match a value against a literal.
case x
when 1: "one"
when 2: "two"
else "other"
esac
Literals are limited to integers, strings and enum members, both for efficiency and to support exhaustiveness checking
Variant patterns: match a union value against a variant and bind its fields.
case shape
when Shape.CIRCLE(radius): 3.14 * radius * radius
when Shape.SQUARE(side): side * side
esac
Tuple patterns: match a tuple and bind its elements. The elements of a tuple pattern can be any valid pattern, including literal patterns, variable patterns, variant patterns, and other tuple patterns.
case tuple
when (1, y): y
when (x, 2): x
else 0
esac
Type patterns: match a value against a type and bind the value.
case value
when int(i): i * 2
when string(s): s.length()
else 0
esac
Wildcard pattern: matches any value and discards it.
case tuple
when (1, _): "first element is 1"
when (_, 2): "second element is 2"
else "other"
esac
Nested patterns: combine the above patterns recursively.
case nested
when SOME(SOME(x)): x
when SOME(NONE): 0
when NONE: -1
esac
Note: pattern parsing is potentially ambiguous - if we were to allow arbitrary expressions, we wouldn't be able to tell in some cases until after all symbols are defined whether what we've parsed is a type expression or a value expression, and we'd need a unified parser capable of accepting both type expressions and value expressions to correctly handle all kinds of patterns.
However, in practice we want to restrict literal elements in patterns to compile time constants, so supporting arbitrary expressions is actually not necessary or even desirable.
The pattern element parser then needs to recursively accept either a value or a nest of tuples containing values, where each value could be either a type expression (including a qualified identifier that may turn out after symbols are declared to actually be an enum member or the name of a variable to destructure into), or an integer or a string. Identifier elements may be followed by = and another identifier, for a destructure than copies a variant field into a new variable with a different name.
Pattern Guards
Patterns may be followed by a guard introduced with if and specifying an additional condition that must be met for the pattern to match. Unlike pattern elements, a guard condition can be any arbitrary expression that evaluates at runtime to a bool. The condition can refer to variables bound in the pattern.Because the compiler cannot know at compile time under what circumstances a guard condition may be true, exhaustiveness can't be checked in case statements with guard conditions, and so an else clause is required if guards are present.
case value
when int(i) if i > 0: "positive"
when int(i) if i < 0: "negative"
else "zero"
esac
Exhaustiveness Checking
Ideally the compiler should check that pattern matching is exhaustive, i.e., that all possible cases are covered. If a case expression is not exhaustive, the compiler should issue a warning. The else clause can be used to make a case expression exhaustive. However the initial release may omit these checks.
Compilation
Pattern matching can be compiled in different ways depending on the characteristics of the patterns:
For simple patterns (literals, wildcards), a sequence of equality checks can be used.
For union variants, a switch on the discriminator field followed by extractions of the variant fields can be used.
For type patterns, type tests and casts can be used.
For complex nested patterns, a combination of the above techniques can be used.
Ideally the compiler will choose the most efficient compilation strategy based on the patterns used, but for the initial release the requirement is only that the matching is correct.
Requirements Checklist
[ ] Implement case expressions for pattern matching
[ ] Support when clauses with patterns and result expressions
[ ] Support else clause for default result
[ ] Require esac keyword to terminate case expression
[ ] Support literal patterns (integers, strings, enum members)
[ ] Support variant patterns (match union value against variant and bind fields)
[ ] Support tuple patterns (match tuple and bind elements)
[ ] Support type patterns (match value against type and bind value)
[ ] Support wildcard pattern (match any value and discard it)
[ ] Support nested patterns (combine patterns recursively)
[ ] Restrict literal elements in patterns to compile-time constants
[ ] Implement pattern element parser to handle type expressions, integers, strings, and identifiers with optional destructuring
[ ] Support pattern guards (additional conditions for pattern matching)
[ ] Require else clause if guards are present (due to lack of exhaustiveness checking)
[ ] Compile pattern matching efficiently based on pattern characteristics (equality checks, switch statements, type tests/casts)
[ ] Initial implementation: ensure correct matching, even if not optimally efficient
[ ] Handle source-breaking change from default to else in case expressions
[ ] Temporarily rewrite compiler code using if/elif/else for bootstrapping purposes
Examples
Binary tree:
union Tree[T] is
NODE(value: T, left: Tree[T], right: Tree[T]);
LEAF;
si
treedepth(tree: Tree[T]) -> int =>
case tree
when LEAF: 0
when NODE(, left, right): 1 + max(tree_depth(left), tree_depth(right))
esac;
2. Optional tuple:
```ghul
let optional_tuple: Option[(int, string)] = some((42, "hello"));
case optional_tuple
when SOME((x, _)) if x > 0: "positive tuple"
when SOME((_, s)) if s.length() > 5: "long tuple"
when SOME(_): "other tuple"
when NONE: "no tuple"
esac
Nested results:
union Error is
INVALID_INPUT;
IO_ERROR(message: string);
si
let result: Result[Result[int, Error], Error] = ok(ok(42));
case result
when OK(OK(x)): x
when OK(ERROR(INVALID_INPUT)): 0
when OK(ERROR(IO_ERROR(message))): throw new IOException(message)
when ERROR(INVALID_INPUT): throw new ArgumentException("Invalid input")
when ERROR(IO_ERROR(message)): throw new IOException(message)
esac
Pattern Matching
Introduction
This issue proposes the addition of pattern matching to the ghūl programming language. Pattern matching provides a concise and safe way to work with tagged unions and other data types.
Syntax
Pattern matching in ghūl is performed using the
case
expression:case
keyword is followed by the expression to match on.when
clause specifies a pattern to match and the corresponding result expression.else
clause is optional and specifies the result if no pattern matches.esac
keyword terminates thecase
expression.when
clause is a statement list to be executed if the pattern matches. The last statement must be an expression that will provide a value for thewhen
Note the use of
else
rather thandefault
is a source breaking change - existingcase
statements will need to be converted tocase
expressions by replacingdefault
withelse
. Ascase
statements are very rarely used the impact of this will be low, but there are places in the compiler (the tokenizer) that will need to be re-written temporarily to useif
/elif
/else
to allow bootstrapping across this breaking change.Patterns
ghūl will support the following types of patterns:
Literal patterns: match a value against a literal.
Literals are limited to integers, strings and enum members, both for efficiency and to support exhaustiveness checking
Variant patterns: match a union value against a variant and bind its fields.
Tuple patterns: match a tuple and bind its elements. The elements of a tuple pattern can be any valid pattern, including literal patterns, variable patterns, variant patterns, and other tuple patterns.
Type patterns: match a value against a type and bind the value.
Wildcard pattern: matches any value and discards it.
Nested patterns: combine the above patterns recursively.
Note: pattern parsing is potentially ambiguous - if we were to allow arbitrary expressions, we wouldn't be able to tell in some cases until after all symbols are defined whether what we've parsed is a type expression or a value expression, and we'd need a unified parser capable of accepting both type expressions and value expressions to correctly handle all kinds of patterns.
However, in practice we want to restrict literal elements in patterns to compile time constants, so supporting arbitrary expressions is actually not necessary or even desirable.
The pattern element parser then needs to recursively accept either a value or a nest of tuples containing values, where each value could be either a type expression (including a qualified identifier that may turn out after symbols are declared to actually be an enum member or the name of a variable to destructure into), or an integer or a string. Identifier elements may be followed by
=
and another identifier, for a destructure than copies a variant field into a new variable with a different name.Pattern Guards
Patterns may be followed by a guard introduced with
if
and specifying an additional condition that must be met for the pattern to match. Unlike pattern elements, a guard condition can be any arbitrary expression that evaluates at runtime to abool
. The condition can refer to variables bound in the pattern.Because the compiler cannot know at compile time under what circumstances a guard condition may be true, exhaustiveness can't be checked in case statements with guard conditions, and so an else clause is required if guards are present.Exhaustiveness Checking
Ideally the compiler should check that pattern matching is exhaustive, i.e., that all possible cases are covered. If a
case
expression is not exhaustive, the compiler should issue a warning. Theelse
clause can be used to make acase
expression exhaustive. However the initial release may omit these checks.Compilation
Pattern matching can be compiled in different ways depending on the characteristics of the patterns:
Ideally the compiler will choose the most efficient compilation strategy based on the patterns used, but for the initial release the requirement is only that the matching is correct.
Requirements Checklist
case
expressions for pattern matchingwhen
clauses with patterns and result expressionselse
clause for default resultesac
keyword to terminatecase
expressionelse
clause if guards are present (due to lack of exhaustiveness checking)default
toelse
incase
expressionsif
/elif
/else
for bootstrapping purposesExamples
treedepth(tree: Tree[T]) -> int => case tree when LEAF: 0 when NODE(, left, right): 1 + max(tree_depth(left), tree_depth(right)) esac;
let result: Result[Result[int, Error], Error] = ok(ok(42));
case result when OK(OK(x)): x when OK(ERROR(INVALID_INPUT)): 0 when OK(ERROR(IO_ERROR(message))): throw new IOException(message) when ERROR(INVALID_INPUT): throw new ArgumentException("Invalid input") when ERROR(IO_ERROR(message)): throw new IOException(message) esac