CatalaLang / catala

Programming language for literate programming law specification
https://catala-lang.org
Apache License 2.0
1.99k stars 77 forks source link

Encode default terms as pattern matching #742

Open adelaett opened 2 weeks ago

adelaett commented 2 weeks ago

Overview

This proposal suggests a design enhancement for the default calculus, which could improve the compilation process and increase the language's expressiveness.

Motivation

The idea comes from a simple observation: the default term <e1, ... en | ej :- ec> can be viewed as a form of pattern matching: <e1, ... en | ej is pj :- ec>. If ej represents not a boolean condition but a pattern, <| ej is pat :- ec>, we could encode traditional pattern matching as follows:

< <| ej is pat1 :- ec1>, <| ej is pat2 :- ec2> | Nothing :- empty>

This structure would allow variables to be bound within the consequence of a default term. For example:

match x with 
| Some y when y >= 0 => y + 1
| Some y when y < 0 => y - 1
| None => 0
end

can be encoded as

< <| x is Some y when y >= 0 :- <y + 1>>,
  <| x is Some y when y < 0 :- <y - 1>>,
  <| x is None :- <0>>
  | Nothing :- Empty
>

Note that the Nothing is different from the existing false in the boolean language, and of patterns. It is more like an empty pattern

High-Level Syntax

In terms of surface syntax, this change would mean we could add under condition x is Some y, making the variable y accessible in the current context. Here is a more concrete example:

definition z
under condition x is Some y when y >= 0
consequence equals y+1

definition z
under condition x is Some y when y < 0
consequence equals y

definition z
under condition x is None
consequence equals y

Compilation Approach

For the compilation process, we could integrate match ... with structures with default terms, utilizing pattern-matching compilation techniques to optimize the evaluation of these terms such as [maranget-ml2008]. (with modifications related to conflict and nothing patterns)

In particular, to check there is no conflict between too branches, one need only to check the branches where the pattern is the same. This could reduce the number of cases to handle.

@article{maranget-ml2008,
        author =    "Luc Maranget",
        title =     "Compiling Pattern Matching to Good Decision Trees",
        booktitle = "Workshop on the Language ML",
        year =      2008,
        month =     September,
        publisher = "ACM Press"
}
AltGr commented 2 weeks ago

Hm, I am not yet totally convinced of the point, but will be happy to discuss it :)

In particular, doesn't the presence of when clauses defeat most of the pattern-matching optimisations ? (I need to re-read Luc's paper!)

Two remarks:

adelaett commented 2 weeks ago

The problem of the expr matches constructor is the same as x with pattern Some of y for the condition; You cannot access it inside the definition, hence you need to re-match the same expression.

For the opitmization on pattern matching, i talked with @Nadrieril about pattern matching with for catala, and we arrived at the conclusion that it is an non-trivial interesting problem, especially if we want to compile to ocaml pattern matching among other things.

We mainly focused on two points: the first one on how to express the conflict condition using only pattern matching. It should be possible to do it using either exception-counting with decision trees, with two runs, or using back-tracking automata that backtrack the first found exception.

However it is non-trivial to adapt those to nested catala expressions, as it is not obvious to find the correct ways to flatten back the pattern matching.

Nevertheless, @Nadrieril indicated that it should be possible to encode pattern matching with a lot of conditions if the conditions have a certain form. For example if they are intervals on a total order, it is possible to have a logarithmic decision tree. This could be interesting for compiling more efficiently catala.

--

In the meantime, modifying the compilation of this new version of default terms is not very hard. Instead of writing

let res =
    List.fold (function Empty, v | v, Empty => v | v, v => Conflict) Empty [e1, ..., en]
in
if ej then ec else Empty

We can write:

let res =
    List.fold (function Empty, v | v, Empty => v | v, v => Conflict) Empty [e1, ..., en]
in
match ej with |pj => ec | _ => Empty

This is minor modification given regular pattern matching are already part of the lcalc intermediate representation.

AltGr commented 2 weeks ago

One major difference that I can see is that pattern-matching is supposed to stop processing at the first match; here, unless we assume there will be no conflicts, we will need to process all of them anyway. Another is that during a pattern-matching, you can leverage the structure of the matched term to drive the decision tree, and here there isn't in general much structure to rely on ?