anoma / geb

A Categorical View of Computation
https://anoma.github.io/geb/
GNU General Public License v3.0
28 stars 10 forks source link

Add functor/algebra iteration to Geb external API #62

Open rokopt opened 1 year ago

rokopt commented 1 year ago

Geb is designed internally for its recursive types to be (co)limits of (compositional) powers of polynomial functors, and it should expose an interface to clients to allow objects to be specified as n-fold compositional powers of polynomial functors, and hom-objects as algebras of polynomial functors.

The idea is that the combination of a polynomial functor f, an object a, a non-zero natural number n, and an algebra f a -> a allow a type to be defined with (aside from the addition of the fixed n) the same signature as a recursive one (the recursive type in this case would be the object component of the initial algebra Mu(f) of f, which f, as a polynomial endofunctor, is guaranteed to have), and a function with the same signature as a recursive one (where the algebra of type f a -> a is the function that would be passed to the catamorphism which serves as the elimination rule for Mu(f), but actually producing a type with a finite number of terms, and non-recursive function with signature f^n -> a, which can be executed on a polynomial circuit.

Specifically:

There will be an operator apply which takes a poly and a substobj and applies the (object component of the) endofunctor to the object (producing another object).

There will be an operator map which takes a poly and a morphism and applies the (morphism component of the) endofunctor to the morphism (producing another morphism).

There will be an alias alg which takes a poly and a substobj, and it produces the type of algebras: specifically, alg f a is hom(f(a), a).

Finally, there will be a bounded-depth "catamorphism", called something like catan: for each f : poly, n : Nat, a : substobj, and m : alg f a (so m is a morphism from f a to a), catan f n a m will produce a morphism from f^n(so0) (the nth iteration of f starting with the initial object of the subst category) to a. This is the interface that produces a function resembling a catamorphism from Mu(f) to a, but for bounded-depth data structures.

Edit: @lukaszcz pointed out that there's something missing from the above. It's supposed to be a finite prefix of a directed colimit, but the "directed" part means that there are injections f^n -> f ^ (S n), which I forgot to specify. Those should exist, as well as a convenience interface that composes an arbitrary number of them to produce f^n -> f^m for any n < m.

lukaszcz commented 1 year ago

The existence of the injections doesn't actually solve the problem completely, because they can go only one way. When we have a Juvix function, e.g.:

f : List Nat -> List Nat
f xs := 1 :: 2 :: xs

then after compilation to GEB it needs to have type

f : List^n Nat -> List^{n+2} Nat

for some n. The List type is translated to:

type List^n A :=
| nil : List^n A
| :: : A -> List^n A -> List^{n+1} A

(we are essentially adding size upper bound annotations).

Thus we cannot just globally change every occurrence of the List type to List^N for some maximum N.

In general, this seems to leave us with two options:

  1. Duplicate the code for all functions operating on unbounded data types for every possible truncation depth n of the data type.
  2. Use a different, essentially untyped encoding for unbounded data types, and signal a "runtime" error when one attempts to create a data structure that would be too large (couldn't be encoded with the maximum finite number of field elements used for unbounded datatype encoding).

The first option seems prohibitive in terms of the efficiency of the generated VampIR code / circuit size, but in practice it might be just good enough with reachability analysis (filtering out the definitions not reachable from the main function) or generating (at compile time) the truncations "on demand".

Note that we still need injections which are (almost) "free" (in terms of "execution" time / circuit size), because we need to cast things to the common upper bound. For example:

f : List Nat -> List Nat
f nil := nil;
f (x :: xs) := if (x = 0) xs (x :: xs)

would be converted to

f : List^n Nat -> List^n Nat
f nil := nil
f (x :: xs) := if (x = 0) (inject xs) (x :: xs)
rokopt commented 1 year ago

Use a different, essentially untyped encoding for unbounded data types, and signal a "runtime" error when one attempts to create a data structure that would be too large (couldn't be encoded with the maximum finite number of field elements used for unbounded datatype encoding).

Besides an essentially untyped API, would a dependently typed API (Vect n a) also work? (I'm trying to understand which alternatives there are of sets of APIs in addition to the ones currently in the description that would allow Juvix to express anything it wants to express.)

lukaszcz commented 1 year ago

Use a different, essentially untyped encoding for unbounded data types, and signal a "runtime" error when one attempts to create a data structure that would be too large (couldn't be encoded with the maximum finite number of field elements used for unbounded datatype encoding).

Besides an essentially untyped API, would a dependently typed API (Vect n a) also work? (I'm trying to understand which alternatives there are of sets of APIs in addition to the ones currently in the description that would allow Juvix to express anything it wants to express.)

Sorry, didn't notice your comment.

The problem is that you can't, in general, go from e.g. List^{n+1} to List^n without doing some runtime check whether the list is short enough. Dependent types don't seem to help here. The problem essentially stems from the fact that we're forced to translate non-dependently typed programs and data structures into dependently typed ones, regardless whether this dependency is explicitly in the API or not (we have different versions of List^n for each n).