CatalaLang / catala

Programming language for literate programming law specification
https://catala-lang.org
Apache License 2.0
1.98k stars 77 forks source link

Deep embedding of Catala's runtime values #103

Closed denismerigoux closed 3 years ago

denismerigoux commented 3 years ago

The problem

As presented in #89, we need a deep embedding of the runtime values manipulated by the Catala language. This embedding would not only serve for the external interpreter modules, but also logging purposes. Indeed, we want Catala clients to be able to access the list of intermediate values reached by the execution to display the computation trace in a meaningful way. This is a pre-requisite for having a nice UI to explain the computations, which is a crucial objective of Catala.

Right now, these intermediate values are stored in a heterogeneous map :

https://github.com/CatalaLang/catala/blob/6e122da48982fe6638e4483a71e6ef7fddb58d0a/src/catala/runtime.ml#L64-L76

Because we strong-arm the OCaml type system with Obj.magic, these values are reflected back in Javascript as the way they are encoded by Js_of_ocaml, which is not palatable for our uses.

The solution

I'm quoting @protz comment here:

Embeddings and de-embeddings

The idea is that within the interpreter, Catala terms are represented as:

type expr =
  | IntLit of int
  | Record of expr list

Going from the interpreter to a natively-compiled function, a function deembed goes from the deep embedding to a native representation of objects; it has type expr -> Obj.t. Sketch:

let deembed = function
  | IntLit x -> Obj.obj x
  | Record es -> let r = Obj.create_block (List.length es) in List.iteri (fun i e -> Obj.set_field r i (deembed e)); r

Going from the result of a natively-compiled function, a function embed goes from the native representation to a deeply embedded representation; it has type Obj.t -> typ -> expr where the typ argument is a the interpreter's representation of the external function's return type:

let embed o = function
  | Int -> IntLit (Obj.repr o <: int)
  | Record ts -> Record (List.mapi (fun i t -> embed (Obj.field o i) t) ts)

I'm not sure this function can be written safely, unless a GADT is introduced to represent the type-safe conversions.

Implementation

To be able to be reused independently of the compiler code, the implementation of the embedding and de-embedding should be done in the Runtime module and should not use types from any of the Catala intermediate representations.

Then, this embedding can be exposed to Javascript by extending src/french_law/api_web.ml.

On the compiler side, we want the generated code to safely embed the values for logging purposes, which means that logging functions in the generated code should carry a tag indicating the type of the values they're logging. This tag should be added to the Dcalc and Lcalc ASTs and filled during typechecking in src/catala/dcalc/typing.ml.

denismerigoux commented 3 years ago

So I made some progress over this during the week-end. We now have a type for embedded Catala runtime values :

https://github.com/CatalaLang/catala/blob/c8f354d327a292d9f9da37162638a05c271ce464/src/catala/runtime.mli#L43-L53

Note that this type features Unembeddable for things like functions. Indeed, we probably want to restrict our embedding to simple values and algebraic data types, anything more complicated raises a number of problems.

The way it works for logging in the generated code is like this :

https://github.com/CatalaLang/catala/blob/c8f354d327a292d9f9da37162638a05c271ce464/src/french_law/law_source/allocations_familiales.ml#L1589-L1592

Indeed, note the signature of log_variable_definition :

https://github.com/CatalaLang/catala/blob/c8f354d327a292d9f9da37162638a05c271ce464/src/catala/runtime.mli#L96

This scheme allows us to generate code with a logging system that retrieves the intermediate values of the program, while keeping a shallow embedding for the generated OCaml code. The logged values can be retrieved after execution thanks to these:

https://github.com/CatalaLang/catala/blob/c8f354d327a292d9f9da37162638a05c271ce464/src/catala/runtime.mli#L82-L90

The logging system can be removed very simply when we want to produce even more efficient code that does not care about intermediate values. This behavior is controlled by the -t option of the Catala compiler :

https://github.com/CatalaLang/catala/blob/c8f354d327a292d9f9da37162638a05c271ce464/src/catala/lcalc/to_ocaml.ml#L314-L325

What is there left to do ?

Currently, the main task is to implement the embedding of algebraic data structures. The embedding should be done with the Struct and Enum cases of the runtime_value type. The module src/catala/lcalc/to_ocaml.ml should generate, alongside the definition of a user-provided algebraic datatype Foo, a function

let embed_foo (x: foo) : runtime_value = ...

The code of embed_foo should be derived from the structure of the type, available at this point in the compiler. If Foo has a struct field of type Bar, then embed_foo should call the generated embed_bar function.

Once this is done, the embedding should be reflected into JSON objects by implementing :

https://github.com/CatalaLang/catala/blob/c8f354d327a292d9f9da37162638a05c271ce464/src/french_law/api_web.ml#L106-L107

How to test ?

The working of the embedding can be tested by going into french_law_js and running :

make -C .. generate_french_law_library build_french_law_library_js -B && node allocations_familiales.js

You might want to put the following bit of code into allocations_familiales.js :

var result = Law.computeAllocationsFamiliales({
  currentDate: new Date(Date.UTC(2020, 04, 26)),
  children: [
    {
      id: 0,
      remunerationMensuelle: 0,
      dateNaissance: new Date(Date.UTC(2003, 02, 02)),
      priseEnCharge: "Effective et permanente",
      aDejaOuvertDroitAuxAllocationsFamiliales: true,
    },
    {
      id: 1,
      remunerationMensuelle: 300,
      dateNaissance: new Date(Date.UTC(2013, 09, 30)),
      priseEnCharge: "Garde alternée, partage des allocations",
      aDejaOuvertDroitAuxAllocationsFamiliales: true,
    },
  ],
  income: 30000,
  residence: "Métropole",
  personneQuiAssumeLaChargeEffectivePermanenteEstParent: true,
  personneQuiAssumeLaChargeEffectivePermanenteRemplitConditionsTitreISecuriteSociale: true,
});
console.log(Law.retrieveLog(0));
Law.resetLog();
process.exit(0);
denismerigoux commented 3 years ago

Well it seems that a1296a7 solves the struct/enums embedding part :)