Closed aspiwack closed 2 years ago
The worklist is currently defined as
type Worklist l = NodeMap l ClassId
Because we actually skip some steps and add the parents+their ids instead of the actual class id (which I got from the rust implementation).
I was trying to understand better what you meant with the IntSet
and was re-reading the code, and I realize I might have made some mistakes regarding the deduplication of the worklist when I moved from [(ENode l, ClassId)]
to NodeMap l ClassId
.
So I'll look into it taking what you said into consideration
Oh yes, I wrote IntSet
, I meant NodeMap
. Well, I meant IntSet
, because I thought it was an IntSet
for some reason, but I was mistaken.
Actually NodeMap
is backed by a Map
from the containers
library, for which fromList
is more efficient than a sequence of insert
. So it may even be a bit better than what I was claiming. Worth checking out. Do you have any benchmark to test it on?
(it's possible that it's actually difficult to not use a list for correctness as you point out in your comment, in which case a comparison is moot, obviously)
I see, that's a nice performance insight.
I don't have micro-benchmarks, but rather a symbolic maths testsuite which I use as benchmark material since it's long enough and some tests still take over 0.1 seconds (and some property tests over 3 seconds!) and performance improvements are usually noticeable there.
The unit tests are mostly related to equality saturation while the property tests are related to the e-graphs and their invariants (so don't expect a performance improvement on the property tests when you modified some part of equality saturation or e-matching)
That said, rebuild
is tested with the longer property tests so we should be able to see whether the performance improves.
As for correctness, the deduplication shouldn't affect correctness, it just means more work (which we still don't want to do!)
I would need to be tested (maybe you've done so already), but I think that it may be be more efficient for the worklist to be a list. It probably is the right choice to have an
IntSet
for the loop, but within the datastructure, I would probably keep a list.Then use (pseudo-code)
IntSet.fromList (map canonicalize worklist)
to generate the set.Because of laziness, this isn't very different (an unevaluated sequence of
insert
is basically a list). But a list gives the opportunity of keeping the list in normal form, which I expect to be more compact. I don't know, it's just a thought.