alt-romes / hegg

Fast equality saturation in Haskell
https://hackage.haskell.org/package/hegg
BSD 3-Clause "New" or "Revised" License
75 stars 8 forks source link

Direct rewrite rules #29

Open jameshaydon opened 11 months ago

jameshaydon commented 11 months ago

At the moment rewrite rules have to specified using a pattern language. For example:

addCommutes = pat (BinOp Add "a" "b") := pat (BinOp Add "b" "a")

This rewrite rule could also be specified with a function:

addCommutes = \case
  Fix (BinOp Add a b) -> Just (Fix (BinOp Add b a))
  _                   -> Nothing

I'll refer to this as a "direct" rewrite rule.

I imagine there are advantages to the pattern matching approach, but I was wondering if it would also be possible to specify rewrite rules using the "direct" approach too? The sort of situation I would like to handle is for example when your language functor has a case like:

data F a = Foo Int a | ...

and you want to match Foo n x only when n satisfies some property, and use n to rewrite the term in some non-trivial way.

And thanks for the great package!

alt-romes commented 11 months ago

Interesting idea! I think it would be lovely to be able to write "direct" rewrite rules as you've presented them.

I can't tell right away whether it's possible, but we might be able to pull it off.

In essence, equality saturation works by

  1. E-matching the LHS against the e-graph (discovering the e-class in which the LHS is represented, and the e-classes corresponding to the variable bits of the pattern that matched)
  2. For every LHS that matched, represent the RHS pattern (thereby discovering the e-class where the RHS is represented)
  3. Merge the e-class of the LHS with the e-class of the RHS, making them equivalent in the e-graph.

The e-matching's is done through the Data.Equality.Matching.ematch function, where all the magic e-matching happens (currently implemented according to the "Relational E-matching" paper)

So the pattern := pattern :| condition form lends itself well to this kind of workflow. However, I think the hegg implementation is a great playground to explore expressiveness of the interface to equality saturation and e-graphs.

(Small aside on conditional rewrite rule, which come from section 4.2 of the egg paper: I don't think they are sufficient to implement the kind of conditional matching you want, as they seem to be conditions based on analysis data of the variable parts of the pattern. In that sense, what you envision seems more appropriate and much more direct)

It is probably best to keep the e-matching main work function as is - receiving a pattern. Ideally, we should be able to go back and forth between those two implementations as they seem roughly equivalent...

We should probably start from the "direct" rewrite rule and try to reconstruct the pattern based one. Then we can consider the conditions on top of the direct rewrite rule, which likely filter more matches at step 1.

I'll have more time to investigate this in October, but I'll be happy to participate in a discussion if you decide to pursue this line of thought.

Thanks!

jameshaydon commented 11 months ago

I don't think they are sufficient to implement the kind of conditional matching you want, as they seem to be conditions based on analysis data of the variable parts of the pattern. In that sense, what you envision seems more appropriate and much more direct

Yes, this is what led me to the question: the direct approach allows much more control.

We should probably start from the "direct" rewrite rule and try to reconstruct the pattern based one.

I have only very vaguely looked at the code and papers you linked, but I'm not sure this is the right approach, I think the direct approach likely has significant disadvantages (it can't efficiently express non-linear patterns for example), so you wouldn't want to build on it, I was thinking of it more as an escape hatch.

Tarmean commented 6 months ago

Minor note: Currently languages are grouped by f () so Foo 1 () would be considered a distinct table from Foo 2 () and generic rules don't work. One workaround is to use Foo (Const 1) (), then you can use a variable in the pattern.

One approach I like is to separate matching and rewriting. Here with OverloadedRecordDot for some sugar:

forMatches ("a" .@ pat (BinOp Add "b" "c") ) $ \r-> do
    r.a .==  S (BinOp Add r.c r.b)

This is basically a == (b+c) -> a == (c + b) written using haskell-like at patterns so that you can name subpatterns. Smart constructors or pattern synonyms could make the rewrites a bit nicer to look at. With some pretty arbitrary rewrite monad with helper functions such as

data Subst f = S (f Subst) | C ClassId (.==) :: (MonadEGraph lang ana cid) => cid -> Subst lang -> m () (.==) ident expr = addEquality ident =<< injectEgraph expr injectEGraph :: (MonadEGraph lang ana cid) => Subst lang -> m cid

This is pretty powerful: