degory / ghul

compiler for the ghūl programming language
https://ghul.dev
GNU Affero General Public License v3.0
4 stars 0 forks source link

Pattern Matching #1134

Open degory opened 6 months ago

degory commented 6 months ago

Pattern Matching

Introduction

This issue proposes the addition of pattern matching to the ghūl programming language. Pattern matching provides a concise and safe way to work with tagged unions and other data types.

Syntax

Pattern matching in ghūl is performed using the case expression:

case expression
when pattern1: result1
when pattern2: result2
...
else default_result 
esac

Note the use of else rather than default is a source breaking change - existing case statements will need to be converted to case expressions by replacing default with else. As case statements are very rarely used the impact of this will be low, but there are places in the compiler (the tokenizer) that will need to be re-written temporarily to use if / elif / else to allow bootstrapping across this breaking change.

case_expression ::= "case" expression "when" pattern_clause+ ("else" expression)? "esac"

pattern_clause ::= pattern ":" expression
                 | pattern "if" expression ":" expression

pattern ::= literal_pattern
          | identifier_pattern
          | variant_pattern
          | tuple_pattern
          | type_pattern
          | wildcard_pattern
          | nested_pattern

literal_pattern ::= int_literal
                  | string_literal
                  | qualified_identifier

identifier_pattern ::= identifier ("=" identifier)?

variant_pattern ::= qualified_identifier variant_pattern_fields?
variant_pattern_fields ::= "(" pattern ("," pattern)* ")"

tuple_pattern ::= "(" pattern ("," pattern)* ")"

type_pattern ::= type_expression "(" identifier ")"

wildcard_pattern ::= "_"

nested_pattern ::= "(" pattern ")"

identifier ::= ...
qualified_identifier := ...
int_literal ::= ...
string_literal ::= ...
expression ::= ...
type_expression ::= ...

Patterns

ghūl will support the following types of patterns:

Note: pattern parsing is potentially ambiguous - if we were to allow arbitrary expressions, we wouldn't be able to tell in some cases until after all symbols are defined whether what we've parsed is a type expression or a value expression, and we'd need a unified parser capable of accepting both type expressions and value expressions to correctly handle all kinds of patterns.

However, in practice we want to restrict literal elements in patterns to compile time constants, so supporting arbitrary expressions is actually not necessary or even desirable.

The pattern element parser then needs to recursively accept either a value or a nest of tuples containing values, where each value could be either a type expression (including a qualified identifier that may turn out after symbols are declared to actually be an enum member or the name of a variable to destructure into), or an integer or a string. Identifier elements may be followed by = and another identifier, for a destructure than copies a variant field into a new variable with a different name.

Pattern Guards

Patterns may be followed by a guard introduced with if and specifying an additional condition that must be met for the pattern to match. Unlike pattern elements, a guard condition can be any arbitrary expression that evaluates at runtime to a bool. The condition can refer to variables bound in the pattern.Because the compiler cannot know at compile time under what circumstances a guard condition may be true, exhaustiveness can't be checked in case statements with guard conditions, and so an else clause is required if guards are present.

case value
when int(i) if i > 0: "positive"
when int(i) if i < 0: "negative"
else "zero"
esac

Exhaustiveness Checking

Ideally the compiler should check that pattern matching is exhaustive, i.e., that all possible cases are covered. If a case expression is not exhaustive, the compiler should issue a warning. The else clause can be used to make a case expression exhaustive. However the initial release may omit these checks.

Compilation

Pattern matching can be compiled in different ways depending on the characteristics of the patterns:

Ideally the compiler will choose the most efficient compilation strategy based on the patterns used, but for the initial release the requirement is only that the matching is correct.

Requirements Checklist

Examples

  1. Binary tree:
    
    union Tree[T] is
    NODE(value: T, left: Tree[T], right: Tree[T]);
    LEAF;
    si

treedepth(tree: Tree[T]) -> int => case tree when LEAF: 0 when NODE(, left, right): 1 + max(tree_depth(left), tree_depth(right)) esac;


2. Optional tuple:
```ghul
let optional_tuple: Option[(int, string)] = some((42, "hello"));

case optional_tuple
when SOME((x, _)) if x > 0: "positive tuple"
when SOME((_, s)) if s.length() > 5: "long tuple"
when SOME(_): "other tuple"
when NONE: "no tuple"
esac
  1. Nested results:
    
    union Error is
    INVALID_INPUT;
    IO_ERROR(message: string);
    si

let result: Result[Result[int, Error], Error] = ok(ok(42));

case result when OK(OK(x)): x when OK(ERROR(INVALID_INPUT)): 0 when OK(ERROR(IO_ERROR(message))): throw new IOException(message) when ERROR(INVALID_INPUT): throw new ArgumentException("Invalid input") when ERROR(IO_ERROR(message)): throw new IOException(message) esac

degory commented 6 months ago

Depends on #1132