EGraph should be abstract

aspiwack commented 2 years ago

I don't think that there is any reason to export

The constructor and fields from the EGraph type
The type Memo
The type Worklist

from the Data.Equality.Graph module, is there?

All of these should be implementation details that are not exposed to the user. Exposing them only serves to give ways to break the data structure's invariant.

I wanted to contribute a PR for this directly. But it's a bit more tricky than I have time for right now, because of the Data.Equality.Graph.Lens module, which, for similar reasons, I believe shouldn't be exposed.

What do you think?

alt-romes commented 2 years ago

Sounds good.

I haven't made it abstract yet since I wanted to avoid having *.Internal everywhere (the modules internally do make extensive use of the e-graph and can't do with it being abstract)

The user-facing API can definitely still be improved. I decided to publish and see some interest before continuing to perfect things further. That is, the API is more of a "good sketch" than a finalized drawing

alt-romes commented 2 years ago

On second thought, the lenses might be the solution rather than an impediment after all.

The reason why we must export the constructors of the e-graph is not only for internal use, but also for implementing e-class analysis.

For example, to implement constant folding, if we find a constant value in a node, we might want to remove all other nodes from the e-class except for the constant one. To do so, one needs to modify the e-class in the e-graph (this is done in test/Sym.hs)

However, I don't think we need to ever access some parts of the e-graph (like the worklist) anywhere other than in the EGraph module (and tentatively you'd never need it in an analysis either)

That is, It could be best to provide only the lenses as a way into the structure of the e-graph, for use by other internal modules (I'd have to double check this) and users defining analysis (basically saying those lenses to be the one true way of structural modifications)

aspiwack commented 2 years ago

For example, to implement constant folding, if we find a constant value in a node, we might want to remove all other nodes from the e-class except for the constant one. To do so, one needs to modify the e-class in the e-graph (this is done in test/Sym.hs)

I don't know why you would ever want to be doing this. The constant will be the canonical value, and removing the rest of the class means that you lose the information that the other members are equal to the constant.

As far as I can tell, though, the original egg implementation doesn't allow for such a thing. Graphs are abstract, and the only modifications that you can do on a graph are, essentially, union, and add. And so modify being implemented as a graph-mutating function, can only do that as well. Or did I misunderstand?

Having lenses to every field is functionally equivalent as having the type EGraph be public. It does give you a little bit more manoeuvrability on the implementation side, but not much. It also exposes internal data types that you may want to change in the future for performance tuning.

It's a step in the right direction, in that you can incrementally remove some of the lenses. But in my ideal world there would be none left in the public interface.

alt-romes commented 2 years ago

As far as I can tell, though, the original egg implementation doesn't allow for such a thing.

I think it might, they do so in their testsuite examples.

I don't know why you would ever want to be doing this. The constant will be the canonical value, and removing the rest of the class means that you lose the information that the other members are equal to the constant.

This can be true in some situations, but also unnecessary in others. When rewriting a term we (1) represent it and get an e-class id, (2) we saturate the e-graph, (3) we extract the best node from the id we have -- if there's a constant value, that will be the one (and if there are two, something went very wrong). As an end result, it doesn't matter if we remove all nodes except for the leaf ones as a modification, but for the runtime I would expect it would

But funnily enough, removing the pruning line from modifyA in the testsuite barely makes a difference. Perhaps the cost of pruning is the same as the cost of applying rewrites to more nodes. But can we assume this to be true across use cases?

Are we sure an analysis can rely solely on the abstract e-graphs methods? With more use cases we'd come to know.

We don't provide lenses to every field, and indeed we could incrementally even remove some of them. In particular, right now the worklist is already completely hidden.

Perhaps we should also reconsider modifyA's type signature (also taking into consideration #4)

Optionally modify the e-class 𝑐 based on 𝑑𝑐 , typically by adding an e-node to 𝑐. Modify should be idempotent if no other changes occur to the e-class, i.e., modify(modify(𝑐)) = modify(𝑐)

I don't think we quite capture this in the type, but we do need the whole e-graph if we are e.g. to represent new things

aspiwack commented 2 years ago

I think it might, they do so in their testsuite examples.

You are quite right! I had missed Index/IndexMut. Interesting.

Are we sure an analysis can rely solely on the abstract e-graphs methods? With more use cases we'd come to know.

I'm sure of nothing :slightly_smiling_face: . I don't understand this part of the design very well, to be honest.

I don't think we quite capture this in the type, but we do need the whole e-graph if we are e.g. to represent new things

Yes, ideally it could be represented as a function Class -> Class where Class would be some abstract type with set-like operations. But it's a tad tricky to get right as modification affect the work list and what not. So it may not be very trivial.

alt-romes / hegg

EGraph should be abstract #3