egraphs-good / egglog

egraphs + datalog!
https://egraphs-good.github.io/egglog/
MIT License
400 stars 45 forks source link

Add action to set custom cost #355

Open saulshanabrook opened 5 months ago

saulshanabrook commented 5 months ago

Alternative to https://github.com/egraphs-good/egglog/pull/353 that uses a new action instead of specifying a new table.

A few differences over the other approach:

This PR API:

(datatype E
  (Foo String))

(union (Foo "x") (Foo "y"))
(union (Foo "y") (Foo "z"))

(unstable-cost (Foo "x") 17)
(unstable-cost (Foo "y") 11)
(unstable-cost (Foo "z") 15)

(extract (Foo "y"))

Other PR API:

(datatype E
  (Foo String :cost FooCost))

(union (Foo "x") (Foo "y"))
(union (Foo "y") (Foo "z"))

(set (FooCost "x") 17)
(set (FooCost "y") 11)
(set (FooCost "z") 15)

(extract (Foo "y"))
Python API comparisons

This PR: ```python class E(Expr): pass @function def Foo(x: StringLike) -> E: ... egraph = EGraph() egraph.register( union(Foo("x")).with_(Foo("y")), union(Foo("y")).with_(Foo("z")), cost(Foo("x")).to(i64(17)), cost(Foo("y")).to(i64(11)), cost(Foo("z")).to(i64(15)), ) ``` Other PR: ```python class E(Expr): pass def FooCost(x: StringLike) -> i64: ... @function(cost=FooCost) def Foo(x: StringLike) -> E: ... egraph = EGraph() egraph.register( union(Foo("x")).with_(Foo("y")), union(Foo("y")).with_(Foo("z")), set_(FooCost("x")).to(i64(17)), set_(FooCost("y")).to(i64(11)), set_(FooCost("z")).to(i64(15)), ) ```

mwillsey commented 5 months ago

I like this approach in general. Where does the min occur? It'd be nice to see an egglog example as well as python. In my version, you'd write the cost function with :merge (min old new), so you didn't have to worry about "manually" min-ing things. Here it seems like you'd have to account for that. You could bake the min into update_cost, but then it's unclear how you "reset" the cost if that's ever something you want to do. Thoughts?

saulshanabrook commented 5 months ago

Thanks for taking a look Max!

I updated the description to include the egglog comparison of APIs as well.

Regarding min, I believe currently it's similar to a on merge function that just takes the new value. It does it here.

We could add a way to customize this, or have a different default. I could imagine adding the ability to do something like :cost-merge (max old new) to functions as well. We could also add a reset-cost action I suppose if we find that to be needed.

For both of these, I wonder about waiting to add them until/if they are needed for a particular use case? Or maybe there is a way to add them that is easy enough, just hesitant to add extra complexity until it's needed.

mwillsey commented 5 months ago

Hmm now I'm questioning whether either this or my PR provide the functionality needed from the zulip thread. We need to write out a full example that actually calculates the cost. So in your world, it would look something like this right?

(datatype Expr
  (Mul Expr Expr)
  ...)

(rule ((Mul a b))
      ((set-cost (Mul a b) (+ 10 (+ (cost?? a) (cost?? b)))))

I think we need a way to read the cost of any expression of that sort, otherwise I'm not how the rules to set cost should work.

oflatt commented 5 months ago

Here's how you might do it by defining a table to store the minimum cost for an eclass

(function Cost (Expr) i64 :merge (min old new))
(rule ((Mul a b)
         (MulCost a b c))
       ((set (Cost (Mul a b)) c)))

(rule ((Mul a b)
         (Cost a a-cost)
         (Cost b b-cost))
      ((set (MulCost a b) (+ 1 (+ a-cost b-cost))))) 
mwillsey commented 5 months ago

Right, okay that makes sense. It's quite verbose, and the first rule is pure boilerplate, it should never really look different than that. You'd also need to make rules for every other constructor (even if you were still using AST size as their cost). Maybe I'm missing something?

oflatt commented 5 months ago

Yeah, it's pretty annoying I agree. We could generate these rules or bake them in somehow

mwillsey commented 5 months ago

Ok, I propose that we close/hold both this and #353 because it seems like a realistic use case requires something like @oflatt's code sample above, and I think neither PR actually makes a big dent in the boilerplate that you'd need. We could do something with more code generation, but given that I think we don't know what parts should be generated vs left alone, we should maybe just hold off.

We do however need someway of actually doing the extraction w.r.t. different costs. I think @yihozhang proposed something like passing in pairs of (constructor costfn) to the extract command? Or maybe something more imperative like a top-level command to set the costfn for a fn? What do you y'all think?

oflatt commented 5 months ago

Right I'm, not sure how much should be generated.

I'm against extending the language in a way that is limited or a hack Alternatively, we could expose the tables as read-only and allow people to write extractors in rust. Or we can figure out a pure egglog solution that we are happy with

saulshanabrook commented 5 months ago

I would support waiting on an implementation until we have the use cases more clearly defined.

One possibly way to do that would be some sort of needs/design doc that at least articulates the different needs, and from there we could then see how different options achieve them most elegently.

saulshanabrook commented 4 months ago

Closing this for now, since we might want to refine the design space here a bit more.

AzizZayed commented 2 weeks ago

How stable is this PR to use? I'm working on a project that needs this feature but I cannot wait for the redesigns and brainstorming (although that's good, thank you for your efforts and high standards).

saulshanabrook commented 2 weeks ago

@AzizZayed I will merge in the latest commits to this branch so that it is suitable for you to try working with.

We have started merging in some things with an "unstable" prefix, so that might be an option here, to give folks a chance to play with this before committing to a final design.

saulshanabrook commented 2 weeks ago

This PR is now up to date and the action is now named unstable-cost. If you try using it and have any feedback (positive or negative) that would be useful!

I will try seeing if @tylerhou's original example (which motivated this work) could be made workable using this PR https://egraphs.zulipchat.com/#narrow/stream/328972-general/topic/How.20can.20I.20find.20the.20tree.20associated.20with.20an.20extraction.3F

saulshanabrook commented 2 weeks ago

I added the ability to set costs to be expressions, in order to support the tree join example. This is how you would implement it with this PR:

(datatype JoinTree
          (Rel String)
          (Join JoinTree JoinTree))

(function size (JoinTree) i64 :merge (min old new))

(let ra (Rel "a"))
(let rb (Rel "b"))
(let rc (Rel "c"))
(let rd (Rel "d"))
(let re (Rel "e"))
(let rf (Rel "f"))

(let query (Join (Join (Join (Join (Join ra rb) rc) rd) re) rf))

; Size
(set (size ra) 50)
(set (size rb) 200)
(set (size rc) 10)
(set (size rd) 123)
(set (size re) 10000)
(set (size rf) 1)

;; cost of relation is its size minus 1, since the string arg will have a cost of 1 as well
(rule ((= (size (Rel a)) asize))
      ((unstable-cost (Rel a) (- asize 1))))
;; cost/size of join is product of sizes
(rule ((Join a b)
       (= (size a) asize)
       (= (size b) bsize))
      ((set (size (Join a b)) (* asize bsize))
       (unstable-cost (Join a b) (* asize bsize))))

; Associativity
(rewrite (Join a b) (Join b a))

; Commutativity
(rewrite (Join (Join a b) c) (Join a (Join b c)))

(run 10000)

(extract query)
(extract (size query))

Compared with the original example:

(datatype JoinTree (Rel String) (Join JoinTree JoinTree))

(function cost (JoinTree) i64 :merge (min old new))
(function size (JoinTree) i64 :merge (min old new))

(let ra (Rel "a"))
(let rb (Rel "b"))
(let rc (Rel "c"))
(let rd (Rel "d"))
(let re (Rel "e"))
(let rf (Rel "f"))

(let query (Join (Join (Join (Join (Join ra rb) rc) rd) re) rf))

; Size
(set (size ra) 50)
(set (size rb) 200)
(set (size rc) 10)
(set (size rd) 123)
(set (size re) 10000)
(set (size rf) 1)
(rule ((Join a b) (= (size a) asize) (= (size b) bsize))
      ((set (size (Join a b)) (* asize bsize))))

; Cost
(rule ((Rel a)) ((set (cost (Rel a)) (size (Rel a)))))
(rule ((Join a b) (= (size a) asize) 
                  (= (cost a) acost)
                  (= (size b) bsize)
                  (= (cost b) bcost))
      ((set (cost (Join a b)) (+ (+ (* asize bsize) acost) bcost))))

; Associativity
(rewrite (Join a b) (Join b a))

; Commutativity
(rewrite (Join (Join a b) c) (Join a (Join b c)))

(run 10000)

(extract (cost query))
(extract (size query))

The computed cost for both is the same, 123000757624.

It doesn't allow for setting costs or defining custom merge functions for costs, but as you can see here you can use a companion table to do that. The cost will always just be set as the most recent value.

AzizZayed commented 2 weeks ago

Will I be able to query a unstable-cost table, like I can do with any other function? Or is this designed differently? Like: (print-function unstable-cost 10)

saulshanabrook commented 2 weeks ago

@AzizZayed No you are not able to query it like a regular function, it only shows up in the extraction calculation or when serializing the nodes. What do you need that for?

AzizZayed commented 2 weeks ago

@AzizZayed No you are not able to query it like a regular function, it only shows up in the extraction calculation or when serializing the nodes. What do you need that for?

@saulshanabrook Just for debugging purposes. It's not very important. Can this PR be used on the demo website BTW?

saulshanabrook commented 2 weeks ago

@AzizZayed There is some concern that this won't work with an updated backend that is planning to be integrated soon, so it seems like we might hold off on merging this feature in.

I will still try to update this PR to keep it in good status.

The web UI won't work with this feature.