Analysis design - Githubissues

alt-romes commented 1 year ago

Closes #4

This isn't going too smoothly yet, it's a complicated endeavor. We need definitely more use cases before pulling something like this, I think.

Things that probably can't remain as I've left it in this branch is modifyA, since I do think one might want to add recursive nodes, and that because just manually adding them to the class seems to make the test suite fail. I thought about an alternative in which

modifyA :: EClass a l -> [ModifyActions]

data ModifyActions l = AddNode (ENode l) | AddExpr (Fix l) | DeleteNode (ENode l)

but unsure if that looks good enough, is flexible enough, etc...

So I mean to continue the discussion with this PR.

If anyone implementing analysis could show us their use case I'd be grateful 👍

Thanks

alt-romes commented 1 year ago

Okay, I think I've mostly nailed this down @aspiwack, it'd be great to get some input.

I've done a big re-design of the analysis, and made it (I think) much more friendly, at the cost of some flexibility.

The driving change is the Analysis class definition.

First of all, we add a new parameter to the e-graph to indicate which analysis is to be used (as decided in #4) Secondly, the methods were redesigned so knowledge of the e-graph isn't even necessary.

Both these together allow for combining together analysis. For example, an instance for (Analysis a l, Analysis b l) => Analysis (a, b) l is provided in Data.Equality.Analysis.Instances, and with it we can combine any two (law-abiding) analysis that depend on different domains into one combined analysis that depends on the product of the domains.

The analysis class now is (omitting some examples):

class Eq (Domain a l) => Analysis a (l :: Type -> Type) where

    -- | Domain of data stored in e-class according to e-class analysis
    type Domain a l

    -- | When a new e-node is added into a new, singleton e-class, construct a
    -- new value of the domain to be associated with the new e-class, typically
    -- (always?) by accessing the associated data of n's children
    --
    -- The argument is the e-node term populated with its children data
    makeA :: l (Domain a l) -> Domain a l

    -- | When e-classes c1 c2 are being merged into c, join d_c1 and
    -- d_c2 into a new value d_c to be associated with the new
    -- e-class c
    joinA :: Domain a l -> Domain a l -> Domain a l

    -- | Optionally modify the e-class c (based on d_c), typically by adding an
    -- e-node to c. Modify should be idempotent if no other changes occur to
    -- the e-class, i.e., modify(modify(c)) = modify(c)
    --
    -- The return value of the modify function is both the modified class and
    -- the expressions (in their fixed-point form) to add to this class. We
    -- can't manually add them because not only would it skip some of the
    -- internal steps of representing + merging, but also because it's
    -- impossible to add any expression with depth > 0 without access to the
    -- e-graph (since we must represent every sub-expression in the e-graph
    -- first).
    --
    -- That's why we must return the modified class and the expressions to add
    -- to this class.
    modifyA :: EClass a l -> (EClass a l, [Fix l])

Unsolved/undecided bits:

As you predicted, this does create a lot of ambiguous variables. I think that's okay; I would guess the user who understands data fix points enough to use this library is also comfortable with type applications.

However, I wrote down a possible alternative:

-- | TODO: If the domain is used in the class definition we would no longer have
-- ambiguous types neither the type family. That is, we'd have @Analysis domain
-- language@ instead of @Analysis analysis language@. Would that be better?

Haven't thought it through further than that.

Secondly, I didn't try to get the lambda tests to work in this model. This lambda test used by egg depends on the e-class id of the e-node children to manage free variables. Initially I thought we might be limited since we can no longer do this (depend on the class-ids for the analysis data), but then I thought we might just be able to emulate the behavior or even come up with another way to work the free variables.

Concluding,

I believe this analysis design is a much nicer cleaner/functional/compositional abstraction and I don't think yet it sacrifices too much. It's one good advantage the haskell implementation has over others!

I also think we can revise some of these changes if this more constrained flexibility turns out to be a problematic.

aspiwack commented 1 year ago

Can you explain why you have both the type parameter a and the associated family Domain a l in the type class? I would intuitively expect them to be one and the same (I'm guessing this would cut down on ambiguous variables too).

alt-romes commented 1 year ago

Yes! Almost. That's an alternative design I mentioned above. I'm starting to think it might be better altogether.

Right now, the a variable stands for the type of the analysis. For most cases it will be identical to the Domain a l, but what if we want to have two different analysis whose domain is the same (say Maybe Int).

Drawbacks: lots of ambiguous variables, often redundant type family instances

Pros: multiple instances using the same domain don't require newtype wrappers, we could use type level string literals to "name" the analysis.

I think changing it might be better; writing newtypes to wrap the same domain for use in different analysis doesn't sound bad

alt-romes commented 1 year ago

In writing this reply I see now no reason not to change it.

What currently is came to be historically, and seems now unnecessarily complex

alt-romes commented 1 year ago

@aspiwack I've pushed the design using domain as a type variable.

It allowed us to get rid of an import cycle (great!)
It's overall simpler, the domain is a type variable, not a type instance of what basically was a type label
We still have ambiguity at call sites like equalitySaturation, but there's no way around it (no domain is needed to call that)

I think it's pretty much ready to merge, and we'll have to understand the expressiveness problems when someone bumps into one.

alt-romes commented 1 year ago

@aspiwack I've revised the comments, updated the tutorial and changelog and revived the lambda test on top of the analysis design.

The lambda test is now much much better than before because we do in fact make use of composition of analysis! The lambda language analysis is really a combination of constant folding and free-variable tracking. It worked great and it freed me from the burden of combining them manually. I was also able to implement the analysis without loss of expressiveness. (Note that the lambda test still doesn't implement all tests from the paper, but it definitely does more so on this patch than it used to -- for completion I think we're still missing a redesign of rewrite conditions + dynamic rewrites)

Let's merge this!

alt-romes commented 1 year ago

Thanks for all the help @aspiwack, I'll release this soon in a v0.3

alt-romes / hegg

Analysis design #16