Open aspiwack opened 1 year ago
Thank you!
I haven't given much thought to how one might end up using find
and canonicalize
in practice.
One thing that doesn't feel right with your proposal is that merge
is also unsafe and rebuild
is exported.
The Data.Equality.Graph
module allows one to create, modify, and choose when to rebuild e-graphs. So a consumer of that module (such as Data.Equality.Saturation
) understands the e-graph is only guaranteed to have its invariants maintained after rebuild
.
Perhaps for more correctness we could have this be enforced with an ST
-like monadic interface, in which the analogue for runST
would be rebuild
🙂 -- meaning functions like merge could only be run inside it, rebuild
wouldn't exist, and we wouldn't need any unsafe
, just a difference between the functions that can be run inside the invariant breaking computation and the others. I don't know at what point we're complicating it beyond need, but we could also just have this in additional modules.
So going back to find
and canonicalize
: I wouldn't call them "unsafe" (but do convince me otherwise). Both will work correctly when called on any e-graph, that is, they will find the representative in the e-graph. The contract of having a library in which you can choose when to rebuild the invariants is to understand that until rebuilding it, the invariants aren't maintained. Meaning that if you called merge
on the e-graph a couple of times, and then try to find
an id, you'll find the current representative for it -- which is not necessarily the same as if you had called it after rebuilt
.
One thing that doesn't feel right with your proposal is that
merge
is also unsafe andrebuild
is exported.So going back to find and canonicalize: I wouldn't call them "unsafe" (but do convince me otherwise). Both will work correctly when called on any e-graph, that is, they will find the representative in the e-graph.
In Haskell tradition “unsafe” means: when calling this function, there is a proof obligation that the type system can't discharge, and the programmer will have to prove themself. This is not the case of merge
: it's always safe to call merge
. But it's the case of the current find
and canonicalize
, which only make sense on rebuilt e-graphs.
You are arguing otherwise, but I believe the “right” abstraction is to think of the egg data structure as a lazy e-graph (where the laziness is embodied by the actions deferred to the worklist). When you call find
or canonicalize
you force the data structure. You also have a rebuild
, which is kind of like seq
in Haskell: something that doesn't change the semantics (divergence notwithstanding), but can be used for performance reason.
Perhaps for more correctness we could have this be enforced with an
ST
-like monadic interface, in which the analogue forrunST
would berebuild
slightly_smiling_face
It is likely that you could so something like this. Also something with linear types ( :blush: ). Both sound more painful than they are worth (I mean, I certainly intend to push linear types until it's not unpleasant to do this sort of abstraction, but I wouldn't advise doing so today unless it has a lot of value). But honestly, I think that the simple way outlined in the original issue is really fine.
I quite like the lazy e-graph framing.
Under that light I must say it does make sense.
I agree we can then have an unsafeFind
and unsafeCanonicalize
that doesn't force the e-graph to be rebuilt!
I also do agree that both the monadic and linear types thing would be too much here, and that the proposed solution under that light is good.
I'll leave the issue open and close it when we change the interface in a MR.
Re: linear types: I'm a fan of linear types and of your work on Linear Haskell, and wanted to mention I wrote an undergraduate thesis on synthesis from linear types and more recently wrote a GHC plugin implementing the synthesis using GHC's 9.0 linear types :) (which I got to show to Mathieu at ZuriHac!)
(This is getting very off topic, but this is very neat! I'd love to see you demonstrate it to me some time)
Hello, and thanks for the good work.
This is a design proposal
Both
find
andcanonicalize
, as defined, are abstraction breaking, because the data structure is not in a sound state if you don't callrebuild
previously. My proposal, in keeping with the spirit of Haskell of correctness first is that:find
andcanonicalize
should callrebuild
before querying the data structure (note: in the case where the data structure is already rebuilt, then the work list is empty, thereforerebuild
will be very fast anyway)unsafeFind
andunsafeCanonicalize
would be introduced. With the documentation explaining that the onus is on the programmer to only call them on a “rebuilt” E-graph.