Rename {find,canonicalize} to unsafe{Find,Canonicalize}

aspiwack commented 1 year ago

Hello, and thanks for the good work.

This is a design proposal

Both find and canonicalize, as defined, are abstraction breaking, because the data structure is not in a sound state if you don't call rebuild previously. My proposal, in keeping with the spirit of Haskell of correctness first is that:

the functions named find and canonicalize should call rebuild before querying the data structure (note: in the case where the data structure is already rebuilt, then the work list is empty, therefore rebuild will be very fast anyway)
for the sake of performance tuning, direct queries to the data structure, called unsafeFind and unsafeCanonicalize would be introduced. With the documentation explaining that the onus is on the programmer to only call them on a “rebuilt” E-graph.

alt-romes commented 1 year ago

Thank you!

I haven't given much thought to how one might end up using find and canonicalize in practice.

One thing that doesn't feel right with your proposal is that merge is also unsafe and rebuild is exported.

The Data.Equality.Graph module allows one to create, modify, and choose when to rebuild e-graphs. So a consumer of that module (such as Data.Equality.Saturation) understands the e-graph is only guaranteed to have its invariants maintained after rebuild.

Perhaps for more correctness we could have this be enforced with an ST-like monadic interface, in which the analogue for runST would be rebuild 🙂 -- meaning functions like merge could only be run inside it, rebuild wouldn't exist, and we wouldn't need any unsafe, just a difference between the functions that can be run inside the invariant breaking computation and the others. I don't know at what point we're complicating it beyond need, but we could also just have this in additional modules.

So going back to find and canonicalize: I wouldn't call them "unsafe" (but do convince me otherwise). Both will work correctly when called on any e-graph, that is, they will find the representative in the e-graph. The contract of having a library in which you can choose when to rebuild the invariants is to understand that until rebuilding it, the invariants aren't maintained. Meaning that if you called merge on the e-graph a couple of times, and then try to find an id, you'll find the current representative for it -- which is not necessarily the same as if you had called it after rebuilt.

aspiwack commented 1 year ago

One thing that doesn't feel right with your proposal is that merge is also unsafe and rebuild is exported.

So going back to find and canonicalize: I wouldn't call them "unsafe" (but do convince me otherwise). Both will work correctly when called on any e-graph, that is, they will find the representative in the e-graph.

In Haskell tradition “unsafe” means: when calling this function, there is a proof obligation that the type system can't discharge, and the programmer will have to prove themself. This is not the case of merge: it's always safe to call merge. But it's the case of the current find and canonicalize, which only make sense on rebuilt e-graphs.

You are arguing otherwise, but I believe the “right” abstraction is to think of the egg data structure as a lazy e-graph (where the laziness is embodied by the actions deferred to the worklist). When you call find or canonicalize you force the data structure. You also have a rebuild, which is kind of like seq in Haskell: something that doesn't change the semantics (divergence notwithstanding), but can be used for performance reason.

Perhaps for more correctness we could have this be enforced with an ST-like monadic interface, in which the analogue for runST would be rebuild slightly_smiling_face

It is likely that you could so something like this. Also something with linear types ( :blush: ). Both sound more painful than they are worth (I mean, I certainly intend to push linear types until it's not unpleasant to do this sort of abstraction, but I wouldn't advise doing so today unless it has a lot of value). But honestly, I think that the simple way outlined in the original issue is really fine.

alt-romes commented 1 year ago

I quite like the lazy e-graph framing.

Under that light I must say it does make sense.

I agree we can then have an unsafeFind and unsafeCanonicalize that doesn't force the e-graph to be rebuilt!

I also do agree that both the monadic and linear types thing would be too much here, and that the proposed solution under that light is good.

I'll leave the issue open and close it when we change the interface in a MR.

Re: linear types: I'm a fan of linear types and of your work on Linear Haskell, and wanted to mention I wrote an undergraduate thesis on synthesis from linear types and more recently wrote a GHC plugin implementing the synthesis using GHC's 9.0 linear types :) (which I got to show to Mathieu at ZuriHac!)

aspiwack commented 1 year ago

(This is getting very off topic, but this is very neat! I'd love to see you demonstrate it to me some time)

alt-romes / hegg

Rename {find,canonicalize} to unsafe{Find,Canonicalize} #1