Closed ezrosent closed 10 months ago
Thanks for the detailed explanation! Nice find, this is very subtle!
Furthermore, we union y and (F y y)
Is this an immediate union or a deferred merge?
I think if it was immediate, y
and (F y y)
would be equal, then later (F y (F y y)) = y
and this problem wouldn't occur.
Is that correct?
Edit: I think I'm getting at your idea of looking up y
instead of (F y y)
when creating (F y (F y y))
As for merging my test case, I think that's a great idea! Ideally we would make it a bit smaller (I think the encoded program doesn't use let bindings so it's big)
That's right: this is a deferred merge. I'm actually not sure where the two get unioned. I thought it would be in apply_merges
but that just calls insert
on the result.
If we change apply_merges
to union the previous value with the output of the merge function, that also fixes the bug. Do you prefer that code? It's making the same assumption about merge functions, but now we always populate the merge function table if we do a merge.
Do you mean only when the merge function is MergeFn::Union
? I also would have expected that we either do the union right away in rebuild or apply_merges
would be doing it.
It doesn't seem right otherwise
Yeah I think I agree. Take a look at the latest commit?
I think we already perform the union on line 404
in function/mod.rs
?
It's just that the thing we insert is not updated (canonical)?
That's just when we have MergeFn::Union
, this code fires when we have MergeFn::Expr
, so line 404 doesn't run
Oh hmm, I'm not so sure I'm down for this change- maybe I still don't quite understand the problem.
My current guess at what is happening-
(F y y)
and y
are getting unioned already- so this change doesn't change the problem you mentioned in this PR.
The actual bug is that we are never using the union-find to look up the canonical version of the newly merged value.
If you wrap to_insert
in a lookup does this solve the problem?
Wrapping to_insert
in a lookup does not solve the problem, sadly. But I don't think canonicalizing upon insert is enough to guarantee that the value will be canonical by the time you actually get to rebuilding F
. I think the main thing you want may just be the code from the initial commit: just don't try and merge two tuples if they are already the same. Thoughts?
Interesting- what about looking up the two values that are being used by the merge function? I think I'm still confused as to how this bug is coming about if it's not a forgotten lookup somewhere.
what about looking up the two values that are being used by the merge function?
That's what the initial code in this PR does, I think?
Okay let's go with that then! Sorry for the waffling, this one is tricky
Alright, I reverted to the initial code and DAG-ified the initial example. Lets see if CI is happy...
egglog runs rebuilding until a fixed point: while egglog programs do not always terminate, rebuilding really always should.
@oflatt wrote an egglog program in which rebuilding didn't terminate. The idea was:
(function F (X X) X)
F
the merge function of some other function(function G (Y) X :merge (F old new))
(G arg1) => x
,x
's canonical member changes toy
, andarg1
's canonical member changes toarg2
. and we also have(G arg2) => y
.During rebuilding, we notice that these two entries are equal (not just their keys, but also their values). In the current egglog code we still run the merge function, creating a new entry
(G arg2) => (F y y)
. Furthermore, we uniony
and(F y y)
. Then:F
, potentially changing the canonical id fory
.G
, creating(F y (F y y))
.F
again...This process continues ad infinitum.
This PR implements a short-term fix, which is to never run the merge function if we are replacing a tuple with itself: merge functions already do not behave well if they aren't idempotent. In this case
(F x x)
always equaledx
eventually, just not until we got a chance to union it. I think this behavior should be correct for any useful merge function, but it feels like a hack. I'm open to other fixes (maybe we can initialize(F y y)
toy
and never callmake_set
?)cc @yihozhang
@oflatt do you mind if I add your test file to the egglog tests? It's a large file but it still takes less time than some of the math benchmarks.