Closed aspiwack closed 1 year ago
Addendum about modify
.
It's been pointed out to me that modify
is probably critical for performance, because without modify we would have to traverse all the equivalence classes frequently to check which ones have changed, whereas modify
is triggered directly by, and on, the classes that have been modified. Much more expedient.
But there is a solution: if we take two analysis, where modify
are called m1
and m2
respectively, then if m1 ∘ m2 ∘ m1 = m2 ∘ m1 ∘ m2
(or m1 ∘ m2 = m2 ∘ m1
) then m1 ∘ m2 ∘ m1
is idempotent (in the case where m1 ∘ m2 = m2 ∘ m1
, then this is simply m1 ∘ m2
, so we can just do one of them). So, in this story, we can't combine any two analysis, but if they are sufficiently well-behaved then we have a solution. It's not clear to me how to reason about more than two functions in these terms, the combinatorics makes my head hurt.
Here is a simple criterion that should suffice though. If
modify
function only depends on the analysis value, andmodify
function doesn't change the analysis value
Then any two such functions commute. I don't know if there are useful modify
functions which are not of this form.I really like the design idea of being able to combine analysis.
It might even allow for reusable analysis across use cases (and it definitely goes with the Haskell way of things).
I'll need to think some more about this but I'm convinced this would be a good improvement to the usability/user-facing side of the library. I'll collect my thoughts and let you know later.
Thank you for thinking this through!
I went back to the Rust implementation, and the EGraph type, there, is parametrised over both the language and analysis, as I suggest above. It it works for them, it should work in Haskell as well. So, this has now officially become my favoured solution.
@aspiwack just linking to a nice 'semilattices' library https://hackage.haskell.org/package/semilattices-0.0.0.4/docs/Data-Semilattice-Meet.html which already provides a number of interesting instances.
Nice, I'm in agreement with adding the analysis as a type parameter too
@ocramz
@aspiwack just linking to a nice 'semilattices' library https://hackage.haskell.org/package/semilattices-0.0.0.4/docs/Data-Semilattice-Meet.html which already provides a number of interesting instances.
Nice! (though it gives an Ord
instance to partial orders, which violates the Ord
laws). It's not on Stackage, it may not be very used besides its original author. But I'm glad to know it exists.
I've reached and implemented what I think is a good design taking this all into consideration at #16
(I promise that I'm done after this one for tonight :slightly_smiling_face: )
I would like to discuss the design of the
Analysis
abstraction.Let me start with a distraction. In an ideal word, there would be a type class
MeetSemiLattice
, andAnalysis
would look more likeThere is a well-maintained
lattices
. Unfortunately it doesn't have semi-lattices. And I don't think it's worth it to introduce a bespoke type class here, especially since the indirection creates some boilerplate. But I'm still a little sadOk, so, to the real point, here. Something that users are likely to want to do is to combine analyses. This is a rather standard thing in abstract interpretation, so let's take this example (it can be used for optimisation, e.g. to prove that certain numbers are not 0 so that we can do some mathematical simplification, or that some branches can't be reached so that we can eliminate them, etc… so it's rather relevant to the whole Egg thing, kind of).
So I can write, say, an interval analysis to approximate values of my number variable (this lets me answer questions like “can this expression evaluate to 0”). In my problem domain, parity happens to matter a lot (this is fictional, I personally have never needed parity), so I also write a parity analysis.
Then, using some kind of Cartesian product, I can combine these two analyses into one, and then run equality saturation using the combined analysis.
Some of my rewrite rules use the interval analysis exclusively, they can be developed independently of the parity analysis. And conversely the other rewrite rules use the parity analysis exclusively. (there may also be rules that use both at the same time, of course). The prerequisite, here, is that I can add new analyses later without changing my rules.
I contend that with the current tool, such a workflow is not supported.
The first impediment is that, due to how the type class is structured, there is a single possible analysis for a given
L
. This can be worked around by usingnewtype
wrappers overL
, likeBut then every rewrite rule will look like (borrowing an example from the README)
this honestly doesn't look nice.
Then there are the members of the class
Domain
is easy: it's the product(A, B)
of the domains of the two analysesjoin
is easy: it'sjoin *** join
make
is tricky because it's such a general function. But we usually don't need such a general function: generally, we just look at the analysis value of the children and we use that to compute our new analysis value. If we givemake
the typethen the Cartesian product of
make
is\as -> (make (fmap fst as), make (fmap snd as))
. So at the price of a slightly more restricted function, we can combinemake
modify
is quite a bit more difficult. First this is a very lightly specified function in the Egg paper. It's too general for combination to make sense. But even if we stick to the only example that I'm aware of – adding nodes – it is not clear what a combination would look like. Because themodify
forA
could add a node, which triggers themodify
fromB
to add a node, which in turn triggers themodify
forA
to add another node, etc… There is no guarantee that this process would converge (and it's probably not particularly fast either). There I don't really have a way forward.Anyway, this is the lesson that I've learnt from abstract interpretation: make small analyses and combine them. I think it's very relevant for Egg (hence
hegg
!). But, while I have some rough idea of what this would need to do, I don't really have a concrete suggestion that would improve the interface. I do understand the reason why theAnalysis
type class is the way it is, it is really not obvious to avoid.Maybe you could have
I'm worried this may yield an unhealthy amount of ambiguous variables, but it may be worth trying.
Though understanding how the
modify
function is necessary (as I'm writing this I'm questioning whether themodify
function is really useful. At least in the case of constant folding, presented in some detail in the Egg paper, we could implement it as an analysis-guided rewrite rule).To conclude: I don't really know what to think. I see this thread as an open discussion.