alt-romes / hegg

Fast equality saturation in Haskell
https://hackage.haskell.org/package/hegg
BSD 3-Clause "New" or "Revised" License
75 stars 8 forks source link

Parent nodes are not deduplicated #21

Open Tarmean opened 1 year ago

Tarmean commented 1 year ago

The node list is a set, so the semigroup instance automatically deduplicates. But the parents are stored as a list with cached size and after merges the list often contains duplicates. The worklist does get nubbed, so it is not catastrophic for performance, but I figured I'd mention it.

This discussion mentions nubbing the parent list as well, so it probably wasn't intentional? https://github.com/alt-romes/hegg/pull/13

Here is the relevant line:

https://github.com/alt-romes/hegg/blob/52c0143b217059f5686e45a2a4da03a8546bf205/src/Data/Equality/Graph.hs#L184

Edit: looked it up and seems like egg doesn't do it either. https://github.com/egraphs-good/egg/issues/113 Probably not crucial then, though it is a tad annoying that the graphviz graph shows them as duplicate nodes.

alt-romes commented 1 year ago

@Tarmean thanks for your interest in the library! The graphviz module was broken at some point, and I never got around to making it work correctly despite being a useful tool.

Hmhm, I see what you mean, though I'm not too sure if it's worth it to do something about it. I don't recall but I believe I did try having sets for the parents rather than lists and it performed worse. It might be worth accounting for it in the graphviz visualisation to avoid showing duplicates.

Regarding performance, the last time I benchmarked I believe the bottleneck was the find operation on the union find. It was recently suggested to me that we might freeze/thaw the union find on rebuild which would keep the purely functional semantics of the data structure, but might fix the bottleneck. However, performance is currently quite good -- a use case in which performance is not good enough would be more compelling to attempt such a change.