egraphs-good / egglog

egraphs + datalog!
https://egraphs-good.github.io/egglog/
MIT License
400 stars 45 forks source link

First class / higher order functions #348

Closed saulshanabrook closed 4 months ago

saulshanabrook commented 6 months ago

This PR is an attempt to support higher-order functions in egglog.

Compared to the earlier approach, in #155, this doesn't embed the bodies of unevaluated functions in the EGraph. Instead, all function values are just represented as lookups to egglog functions. An advantage of this approach is that we don't need to re-implement function application in anyway, we can just re-use our existing implementation. For defining the body of functions, we can just use rewrite rules.

Screenshot 2024-02-16 at 6 15 09 PM

This was inspired by @yihozhang's comment a while ago about how to embed functions in egglog.

As implemented, this supports functions that return primitives and eqsorts, since it uses the builtin mechanism for calling functions.

Although it doesn't support anonymous functions at the moment, adding them would be relatively trivial I believe as syntactic sugar, that would translate to a new global function with a random name and a rewrite rule to cover all cases.

I made sure that this implementation supports partial application, which is necessary for rewrite rules that transform functions.

The implementation is fairly self-contained as a primitive sort, requiring no changes to the parser or type checker. The only main breaking change is requiring a mutable EGraph pointer to be passed to primitive functions. This is necessary so that the function evaluation can possibly mutate the unionfind data structure, if new e-classes are created. I am unable to get access to a mutable egraph in the query checker, so these will panic if attempted to use in facts, they can only be used in action. This limitation seems reasonable.

One of the driving use cases for this on my end, besides just generally being cool to be able to optimize functional programs, is @sklam's example for defining a higher level array API and translating that within egglog.

saulshanabrook commented 5 months ago

@ztatlock and others had wondered whether this PR could be simplified by not supporting curried functions. This would remove the storage of values in the sort, making the function sort not a container sort.

My sense from implementing it below is that it would be possible using codegen/macros, but it expands the amount of code and would require adding implementations for every combination of type of curried arguments for every function.


As currently implemented, this PR would allow you to define a map function over a custom user defined map data structure:

(datatype Math
  (Num i64)
  (Var String)
  (Add Math Math)
  (Mul Math Math))

(rewrite (Mul (Num x) (Num y)) (Num (* x y)))

(datatype MathList
  (Nil)
  (Cons Math MathList))

(sort MathFn (Fn (Math) Math))
(function list-map-math (MathFn MathList) MathList)
(rewrite (list-map-math fn (Nil)) (Nil))
(rewrite (list-map-math fn (Cons x xs)) (Cons (app fn x) (list-map-math fn xs)))

;; version where partial application is allowed

(function list-multiply-by (MathList Math) MathList)
(rewrite (list-multiply-by l i) (list-map-math (fn "Mul" i) l))

However, if partial application was not allowed, we could implement this by defining a custom map function that would partially apply an arg that is passed into it in the function invocation:

;; alternative version where no partial application is allowed

;; Must make a new map function that also takes a binary fn instead of a unary one
;; as well as an initial partialled applied arg
(sort MathMathFn (Fn (Math Math) Math))
(function list-map-math-partial (MathMathFn MathList Math) MathList)
(rewrite (list-map-math-partial fn (Nil)) (Nil))
(rewrite (list-map-math-partial fn (Cons x xs) y) (Cons (app fn y x) (list-map-math-partial fn xs y)))

(rewrite (list-multiply-by l i) (list-map-math-partial (fn "Mul") l i))

Or alternatively, we could write a wrapper for a unary math function that supports partial application and use this (thanks @yihozhang for this version):

(sort MathMathFn (Fn (Math Math) Math))

(datatype MathFnWrapper (partial-app MathMathFn Math))
(function App (MathFnWrapper Math) Math)

(rewrite (App (partial-app f a) b) (app f a b))

;; rewrite
(function list-map-math (MathFnWrapper MathList) MathList)
(rewrite (list-map-math fn (Nil)) (Nil))
(rewrite (list-map-math fn (Cons x xs)) (Cons (App fn x) (list-map-math fn xs)))

(rewrite (list-multiply-by l i) (list-map-math (partial-app (fn "Mul") i)))
oflatt commented 5 months ago

Here's a nice example of how you might do partial application in current egglog. It requires a little bit of boilerplate but it's not too bad:

https://egraphs-good.github.io/egglog/?program=XQAAgACDBAAAAAAAAAAUGQgnfMUD9dO10C6gYDgH7EMCDsP59lqRF2h73FWgyD8WAkCEiBUexoj9z4WGjCHzYIc2LwHce_oYevbmNoBoN_WQ0Ts129CFhQcgqLvd8Zghww4Y3IeW7pA3n8eY20nLHxZ4zXkw6JDSjaSDWK0QQJZ390IDSmZeEE3CmnobPAqTsX8BTdPj8QSF2zgKh5NEEX2EgxsFiBdyGhGDLqKjxejD7h2vbJ5lZDCckd5FQRFFAzgthseF8hCaVbe8kJU5fqVDBKluNgDdTE-eM6dXpgLktUJzigaujVbemliqhHNWu72dmDCnkMNGsm-ekguI93K4PDyt7mN076MFiYzL-v0xR9u0R3D9e1KKf-oWM5SSGIhFSUDFs2w5rVO0BHFZFXVvqDN9CLnk4qha7rVChJrEg9nolD6ZQ4NFfpGoEeGPqvQPc_n5NcCZvWLOXDLPCtkb6Qqpo7kcwRBzZoyl_h83cEZWVE2afqoNAP_Xx-5sPqJU_MQ75bM6ehIg-x14LK329vrl6OKZ8WbSZlNR7b9ztvwJDKIEJmpNxffZ4v4g-SJoF__sLfFz

The nice thing about this example is that it gives you an easy way to reference partial applications without any change to egglog. You do have to write conversion rules that go from a partial application to a normal one, but they are easily generated.

saulshanabrook commented 5 months ago

A few comments I got yesterday from folks on this PR:

saulshanabrook commented 4 months ago

At the meeting today, the consensus was this seems good to go to experiment on in the Python bindings. I prefaced all the public-facing functionality with unstable so that it's clear this could change.

If this is merged, I will work on adding a higher-level semantics to the Python bindings on top of this, for anonymous functions.

saulshanabrook commented 4 months ago

Hey @oflatt, thanks for taking a look at this!

I responded to your points, but overall I am hesitant to make any changes to much outside of the fn.rs sort definition unless they are really required since this is unstable and I wouldn't want to tie us to this implementation if we do want to move.

Also, even if we keep this, we might not want to expose it directly to users as we talked about, so if we made a higher level API, we could make sure that had all the proper affordances, like not representing functions as strings but as symbols.

I am excited to get this in so I can start experimenting with it in the Python bindings!