alt-romes / hegg

Fast equality saturation in Haskell
https://hackage.haskell.org/package/hegg
BSD 3-Clause "New" or "Revised" License
75 stars 8 forks source link

Draft: {find,canonicalize} vs unsafe{Find,Canonicalize} #12

Open alt-romes opened 2 years ago

alt-romes commented 2 years ago

Closes #1

Work in progress (documentation missing)

I realize I'm still not 101% convinced. I do realize the price to pay for the safe find if the e-graph is already rebuilt is minimal, but perhaps I'm still hung up on that.

Currently, our only consumer of find is in Extraction here and here

It feels a bit weird that the e-graph is being rebuilt everytime find is called since it's being done so many times.

The performance difference isn't noticeable though.

alt-romes commented 2 years ago

@aspiwack

This looks good to me now. It's an improvement and probably even more so for just-e-graph users.

There's one thing against the lazy-e-graph-forced-on-rebuild point of view that I'm wondering if we can fix, but which is not really a problem.

In the lazy point of view, calling find on an un-rebuilt e-graph forces the e-graph to be rebuilt.

That would make me expect that

egr = ...    -- (1) un-rebuilt e-graph
find 2 egr -- (2) forces e-graph to be rebuilt
find 3 egr -- (3) e-graph has been forced, so this should be now instantaneous

My case is with (3). Calling find a second time will still rebuild the e-graph completely, which makes the lazy argument a bit weaker.

Just wondering if we can make this fit in the lazy idea, or if it's better not to consider it 100% as a lazy e-graph (and not advertise it as such)

It would be very cool if we could use haskell's laziness to make (3) a reality though!

alt-romes commented 2 years ago

@aspiwack Friendly reminder regarding this PR to close #1

This looks good to me now. It's an improvement and probably even more so for just-e-graph users.

I can otherwise just merge it, but since you brought up the issue I thought you might have something to say about the resolution 🙂

aspiwack commented 2 years ago

Ouch. I'm sorry. Your question requires more than a 5min answer, that how it ended up slipping my mind. I'll give my thoughts by tomorrow (Saturday) night.

aspiwack commented 2 years ago

Ok, I'm back.

Yes, your point is a very good one. I was, ironically, thinking in terms of mutable data structures, in which case there is no problem of going back in time.

I think that there are 3 possible attitudes that we can take. Well, kind of 4, but the fourth one is quite a bit of a stretch.

  1. The first point of view it to not care. This data structure is meant to be used in an imperative fashion, the pure implementation is a distraction. It's quite possible to take this stance, but I don't think that it's super tenable anymore.
  2. We can cache rebuild. The trick is to add an extra field to graphs, this one is not a strict field
    rebuilt_graph :: ReprUnionFind

    Whenever creating a graph, rebuilt_graph is set to rebuilt_graph = internalRebuild <the rest of the graph> Then

    rebuild gr = gr { unionFind = reuilt_graph gr }

    This would work, at the cost of a little more complexity each time you modify a graph.

    • It's worth taking into account that this method prevents you from having to replay a rebuild step . However, if you do an insertion then a rebuild, if you go back before the insertion and rebuild again, you still have to replay the rebuild (sans insertion). A “solution” to that, is to make an Okasaki-style data structure for the e-graph. But this sounds like a research problem, so let's not go there.
  3. We can consider that all this is just a little too fragile. Maybe rebuild should always be called explicitly after all. To avoid calling find on a bad graph we can make find partial. I think that they do something like this in the Rust library. The idea is to add a field to the e-graph
    rebuilt :: Bool

    The rebuilt field is set to True by rebuild, and to False by any mutation. Then find fails if the rebuilt field is False (this makes find partial).

I can't ansewr for you which attitude you should have to this. It's a design question, and depends on where you want to bring this library. It's up to you to choose what serves your purpose best.

alt-romes commented 2 years ago

@aspiwack

Small update:

I've taken a detour into reading Okasaki's book whose ideas are sounding really interesting. I'll attempt to make the e-graph emulate laziness and use some of its ideas in it. Some variation of (2) right now is the most appealing to me.

I'm trying to implement https://dl.acm.org/doi/pdf/10.1145/3485496 at the moment, and intend on trying some other related projects. I hope to experiment a little bit more with the user side of the library to help me in making these decisions a little bit better.

Thanks again!

Note from #1

(This is getting very off topic, but this is very neat! I'd love to see you demonstrate it to me some time)

I sent out an e-mail regarding this!