Open KarelPeeters opened 2 months ago
Yes, associativity (especially together with commutativity) are known to blow up the e-graph. There are various strategies to control this, but they basically amount to artificial limitations on the equality saturation process.
The problem is that equality saturation not only reasons about the (sub)terms that you give it, but it creates many new terms along the way (terms that are not equivalent to anything in the input) and reasons about them too! Associativity is a good example: once you rewrite a + (b + c)
to (a + b) + c
, you've made a new term a + b
that is not equivalent to anything in the input!
This problem is particularly bad with associativity, because you're discovering a million new ways to rewrite 0
. So one approach people take is to prune 0
's e-class:
#[test]
fn foo() {
#[derive(Default)]
struct MyAnalysis;
type EGraph = egg::EGraph<SymbolLang, MyAnalysis>;
impl Analysis<SymbolLang> for MyAnalysis {
type Data = ();
fn make(egraph: &mut EGraph, enode: &SymbolLang) -> Self::Data {
()
}
fn merge(&mut self, a: &mut Self::Data, b: Self::Data) -> DidMerge {
DidMerge(false, false)
}
fn modify(egraph: &mut EGraph, id: Id) {
let nodes = &mut egraph[id].nodes;
if nodes.iter().any(|n| n.is_leaf()) {
nodes.retain(|n| n.is_leaf());
}
}
}
let mut rules: Vec<Rewrite<SymbolLang, MyAnalysis>> = vec![];
rules.push(rewrite!("commute-add"; "(+ ?x ?y)" => "(+ ?y ?x)"));
rules.extend(rewrite!("assoc-add"; "(+ (+ ?x ?y) ?z)" <=> "(+ ?x (+ ?y ?z))"));
rules.push(rewrite!("cancel-add-neg"; "(+ ?x (- ?x))" => "0"));
rules.push(rewrite!("add-0"; "(+ ?x 0)" => "?x"));
let expr = "(+ a (+ b (- a)))";
let runner = Runner::default()
.with_explanations_enabled()
.with_expr(&expr.parse().unwrap())
.with_node_limit(64)
.run(&rules);
runner.print_report();
let mut egraph = runner.egraph;
println!("Generated classes:");
for class in egraph.classes() {
println!(" {}", egraph.id_to_expr(class.id));
}
println!("Explanation for final generated class:");
let newest_id = egraph.classes().map(|c| c.id).max().unwrap();
let newest_expr = egraph.id_to_expr(newest_id);
println!(
"{}",
egraph.explain_existance(&newest_expr).get_flat_string()
);
}
All I have done here is add a trivial e-class analysis that does nothing, except for use the modify hook to prune any e-class that has leaf e-nodes in it. This works on the assumption that once you prove something equivalent to a constant/variable, you don't need to rewrite it any further.
Thanks for the quick response!
The problem is that equality saturation not only reasons about the (sub)terms that you give it, but it creates many new terms along the way (terms that are not equivalent to anything in the input) and reasons about them too! Associativity is a good example: once you rewrite a + (b + c) to (a + b) + c, you've made a new term a + b that is not equivalent to anything in the input!
What's still surprising to me is that all of the rewrite rules I included only reduce or keep the number of operands of an expression, so how is it possible that the nodes keep growing indefinitely?
This problem is particularly bad with associativity, because you're discovering a million new ways to rewrite 0. So one approach people take is to prune 0's e-class: [...]
Thanks, that does work, and the underlying idea makes sense: if something is already proven to be a constant or a variable it's no longer possible to simplify it further (or in my case to put it into a form where it's easier to derive range information).
It stops working again when I slightly expand the ruleset again though, in particular with distributivity between addition and multiplication. This rewrite rule definitely needs to be able to temporarily expand the size of nodes to enable certain optimizations though, so maybe it's to much to ever expect saturation here.
// keep previous rules, add new rules below
rules.push(rewrite!("commute-mul"; "(* ?x ?y)" => "(* ?y ?x)"));
rules.extend(rewrite!("assoc-mul"; "(* (* ?x ?y) ?z)" <=> "(* ?x (* ?y ?z))"));
rules.extend(rewrite!("distr-add-mul"; "(+ (* ?x ?z) (* ?y ?z))" <=> "(* (+ ?x ?y) ?z)"));
rules.push(rewrite!("mul-0"; "(* ?x 0)" => "0"));
rules.push(rewrite!("mul-1"; "(* ?x 1)" => "?x"));
// new test expression
let expr = "(+ (* a (+ a b)) (- (* a b)))";
One pruning idea I'm thinking of is to prevent the AST size to increase beyond eg. twice the size of the largest original expression. That should limit the space to be at least be finite, and hopefully does not prevent too many useful simplifications.
In general, is saturation expected to be unreachable? Or will there always be some neat set of tricks to prune the space to prevent endless superfluous expansion?
Saturation will be unreachable is many use cases that involve algebraic rules like that. Most use cases are just about getting something "useful" done in the time you're willing to spend.
I see, thanks for the info!
I've tried my approach of limiting the AST size instead of using some time or node limit, and it seems to be working well. I'm sharing it here in case anyone ever wants to do something similar.
The approach is:
AstSize
, since it nicely works incrementally (I think).Some example code implementing this is available here, or also in the spoiler below.
Good idea! I like that idea of limiting the AST size. Thanks for sharing!
This is very cool! I had the same idea in my EGRAPHS talk (starting page 29), I did some experiments but found it didn't work well for me because this is still doubly exponential. But maybe I only tested extreme cases (e.g., I added x+0=x
so the e-graph is very cyclic). I'm excited to see how this would work for you!
edit: actually, I only tested it for depth, and maybe limiting the depth is a dumb idea and ASTs work way better
Both of you may also be interested in this un-released feature to forbid returning certain cycles from e-matching: https://github.com/egraphs-good/egg/blob/main/src/language.rs#L781
Hey! Thanks for doing the equivalence graph research and building this crate as an implementation.
I'm investigating how easy it is to use
egg
as part of the integer range type system for an experimental language I'm working on. Right now I'm trying to see fast achieving saturation is for simple arithmetic expressions, only involving addition and negation. This is the full test code I'm running:This already explodes, without a node or time limit the graph keeps growing forever (as far as I can tell). The set of rules I included here is minimal: removing any single rule fixes the "exploding graph" problem, at the cost of severely reducing expressiveness. This is the shortened output of running this program with all 4 rules:
To me it looks like it keeps generating new classes that are equivalent to classes it already knows about, faster than it can prove that they are indeed equivalent. For example, the final class that is printed,
(+ (+ (+ a b) a) (+ (- a) (+ (- a) (- a)))))
, is really just(+ b (- a))
again, but they have not yet been proven to be the same.I'm surprised by steps like
(Rewrite<= add-0 (+ b 0))
being included in the existence reason. I defined the one-directional rewriterewrite!("add-0"; "(+ ?x 0)" => "?x")
, why is it being used in the opposite direction too? Is it expected behavior that simple sets of rules already blow up the graph forever?