egraphs-good / egglog

egraphs + datalog!
https://egraphs-good.github.io/egglog/
MIT License
418 stars 46 forks source link

Short-circuit calls to merge functions with the same argument. #287

Closed ezrosent closed 10 months ago

ezrosent commented 10 months ago

egglog runs rebuilding until a fixed point: while egglog programs do not always terminate, rebuilding really always should.

@oflatt wrote an egglog program in which rebuilding didn't terminate. The idea was:

During rebuilding, we notice that these two entries are equal (not just their keys, but also their values). In the current egglog code we still run the merge function, creating a new entry (G arg2) => (F y y). Furthermore, we union y and (F y y). Then:

This process continues ad infinitum.

This PR implements a short-term fix, which is to never run the merge function if we are replacing a tuple with itself: merge functions already do not behave well if they aren't idempotent. In this case (F x x) always equaled x eventually, just not until we got a chance to union it. I think this behavior should be correct for any useful merge function, but it feels like a hack. I'm open to other fixes (maybe we can initialize (F y y) to y and never call make_set?)

cc @yihozhang

@oflatt do you mind if I add your test file to the egglog tests? It's a large file but it still takes less time than some of the math benchmarks.

oflatt commented 10 months ago

Thanks for the detailed explanation! Nice find, this is very subtle!

Furthermore, we union y and (F y y)

Is this an immediate union or a deferred merge? I think if it was immediate, y and (F y y) would be equal, then later (F y (F y y)) = y and this problem wouldn't occur.

Is that correct?

Edit: I think I'm getting at your idea of looking up y instead of (F y y) when creating (F y (F y y))

oflatt commented 10 months ago

As for merging my test case, I think that's a great idea! Ideally we would make it a bit smaller (I think the encoded program doesn't use let bindings so it's big)

ezrosent commented 10 months ago

That's right: this is a deferred merge. I'm actually not sure where the two get unioned. I thought it would be in apply_merges but that just calls insert on the result.

If we change apply_merges to union the previous value with the output of the merge function, that also fixes the bug. Do you prefer that code? It's making the same assumption about merge functions, but now we always populate the merge function table if we do a merge.

oflatt commented 10 months ago

Do you mean only when the merge function is MergeFn::Union? I also would have expected that we either do the union right away in rebuild or apply_merges would be doing it. It doesn't seem right otherwise

ezrosent commented 10 months ago

Yeah I think I agree. Take a look at the latest commit?

oflatt commented 10 months ago

I think we already perform the union on line 404 in function/mod.rs? It's just that the thing we insert is not updated (canonical)?

ezrosent commented 10 months ago

That's just when we have MergeFn::Union, this code fires when we have MergeFn::Expr, so line 404 doesn't run

oflatt commented 10 months ago

Oh hmm, I'm not so sure I'm down for this change- maybe I still don't quite understand the problem. My current guess at what is happening- (F y y) and y are getting unioned already- so this change doesn't change the problem you mentioned in this PR. The actual bug is that we are never using the union-find to look up the canonical version of the newly merged value.

If you wrap to_insert in a lookup does this solve the problem?

ezrosent commented 10 months ago

Wrapping to_insert in a lookup does not solve the problem, sadly. But I don't think canonicalizing upon insert is enough to guarantee that the value will be canonical by the time you actually get to rebuilding F. I think the main thing you want may just be the code from the initial commit: just don't try and merge two tuples if they are already the same. Thoughts?

oflatt commented 10 months ago

Interesting- what about looking up the two values that are being used by the merge function? I think I'm still confused as to how this bug is coming about if it's not a forgotten lookup somewhere.

ezrosent commented 10 months ago

what about looking up the two values that are being used by the merge function?

That's what the initial code in this PR does, I think?

oflatt commented 10 months ago

Okay let's go with that then! Sorry for the waffling, this one is tricky

ezrosent commented 10 months ago

Alright, I reverted to the initial code and DAG-ified the initial example. Lets see if CI is happy...