Discussion: the design of Analysis

aspiwack commented 2 years ago

(I promise that I'm done after this one for tonight :slightly_smiling_face: )

I would like to discuss the design of the Analysis abstraction.

Let me start with a distraction. In an ideal word, there would be a type class MeetSemiLattice, and Analysis would look more like

class (Eq (Domain l), MeetSemiLattice (Domain l)) => Analysis l where
    type Domain l

    makeA :: ENode l -> EGraph l -> Domain l
    modifyA :: ClassId -> EGraph l -> EGraph l

There is a well-maintained lattices. Unfortunately it doesn't have semi-lattices. And I don't think it's worth it to introduce a bespoke type class here, especially since the indirection creates some boilerplate. But I'm still a little sad

Ok, so, to the real point, here. Something that users are likely to want to do is to combine analyses. This is a rather standard thing in abstract interpretation, so let's take this example (it can be used for optimisation, e.g. to prove that certain numbers are not 0 so that we can do some mathematical simplification, or that some branches can't be reached so that we can eliminate them, etc… so it's rather relevant to the whole Egg thing, kind of).

So I can write, say, an interval analysis to approximate values of my number variable (this lets me answer questions like “can this expression evaluate to 0”). In my problem domain, parity happens to matter a lot (this is fictional, I personally have never needed parity), so I also write a parity analysis.

Then, using some kind of Cartesian product, I can combine these two analyses into one, and then run equality saturation using the combined analysis.

Some of my rewrite rules use the interval analysis exclusively, they can be developed independently of the parity analysis. And conversely the other rewrite rules use the parity analysis exclusively. (there may also be rules that use both at the same time, of course). The prerequisite, here, is that I can add new analyses later without changing my rules.

I contend that with the current tool, such a workflow is not supported.

The first impediment is that, due to how the type class is structured, there is a single possible analysis for a given L. This can be worked around by using newtype wrappers over L, like

newtype Parity a = Parity (L a)

But then every rewrite rule will look like (borrowing an example from the README)

pat (Parity (pat (Parity ("a" :*: "b")) :/: "c")) := pat (Parity ("a" :*: pat (Parity ("b" :/: "c"))))

this honestly doesn't look nice.

Then there are the members of the class

Domain is easy: it's the product (A, B) of the domains of the two analyses
join is easy: it's join *** join
make is tricky because it's such a general function. But we usually don't need such a general function: generally, we just look at the analysis value of the children and we use that to compute our new analysis value. If we give make the type
```
make :: l a -> a
```
then the Cartesian product of make is \as -> (make (fmap fst as), make (fmap snd as)). So at the price of a slightly more restricted function, we can combine make
modify is quite a bit more difficult. First this is a very lightly specified function in the Egg paper. It's too general for combination to make sense. But even if we stick to the only example that I'm aware of – adding nodes – it is not clear what a combination would look like. Because the modify for A could add a node, which triggers the modify from B to add a node, which in turn triggers the modify for A to add another node, etc… There is no guarantee that this process would converge (and it's probably not particularly fast either). There I don't really have a way forward.

Anyway, this is the lesson that I've learnt from abstract interpretation: make small analyses and combine them. I think it's very relevant for Egg (hence hegg!). But, while I have some rough idea of what this would need to do, I don't really have a concrete suggestion that would improve the interface. I do understand the reason why the Analysis type class is the way it is, it is really not obvious to avoid.

Maybe you could have

data EGraph analysis language
class Analysis analysis language

I'm worried this may yield an unhealthy amount of ambiguous variables, but it may be worth trying.

Though understanding how the modify function is necessary (as I'm writing this I'm questioning whether the modify function is really useful. At least in the case of constant folding, presented in some detail in the Egg paper, we could implement it as an analysis-guided rewrite rule).

To conclude: I don't really know what to think. I see this thread as an open discussion.

aspiwack commented 2 years ago

Addendum about modify.

It's been pointed out to me that modify is probably critical for performance, because without modify we would have to traverse all the equivalence classes frequently to check which ones have changed, whereas modify is triggered directly by, and on, the classes that have been modified. Much more expedient.

But there is a solution: if we take two analysis, where modify are called m1 and m2 respectively, then if m1 ∘ m2 ∘ m1 = m2 ∘ m1 ∘ m2 (or m1 ∘ m2 = m2 ∘ m1) then m1 ∘ m2 ∘ m1 is idempotent (in the case where m1 ∘ m2 = m2 ∘ m1, then this is simply m1 ∘ m2, so we can just do one of them). So, in this story, we can't combine any two analysis, but if they are sufficiently well-behaved then we have a solution. It's not clear to me how to reason about more than two functions in these terms, the combinatorics makes my head hurt.

Here is a simple criterion that should suffice though. If

The modify function only depends on the analysis value, and
The modify function doesn't change the analysis value Then any two such functions commute. I don't know if there are useful modify functions which are not of this form.

alt-romes commented 2 years ago

I really like the design idea of being able to combine analysis.

It might even allow for reusable analysis across use cases (and it definitely goes with the Haskell way of things).

I'll need to think some more about this but I'm convinced this would be a good improvement to the usability/user-facing side of the library. I'll collect my thoughts and let you know later.

Thank you for thinking this through!

aspiwack commented 2 years ago

I went back to the Rust implementation, and the EGraph type, there, is parametrised over both the language and analysis, as I suggest above. It it works for them, it should work in Haskell as well. So, this has now officially become my favoured solution.

ocramz commented 2 years ago

@aspiwack just linking to a nice 'semilattices' library https://hackage.haskell.org/package/semilattices-0.0.0.4/docs/Data-Semilattice-Meet.html which already provides a number of interesting instances.

alt-romes commented 2 years ago

Nice, I'm in agreement with adding the analysis as a type parameter too

aspiwack commented 2 years ago

@ocramz

@aspiwack just linking to a nice 'semilattices' library https://hackage.haskell.org/package/semilattices-0.0.0.4/docs/Data-Semilattice-Meet.html which already provides a number of interesting instances.

Nice! (though it gives an Ord instance to partial orders, which violates the Ord laws). It's not on Stackage, it may not be very used besides its original author. But I'm glad to know it exists.

alt-romes commented 1 year ago

I've reached and implemented what I think is a good design taking this all into consideration at #16

alt-romes / hegg

Discussion: the design of Analysis #4