egraphs-good / egglog

egraphs + datalog!
https://egraphs-good.github.io/egglog/
MIT License
400 stars 45 forks source link

Add custom extraction cost functions #353

Closed mwillsey closed 4 months ago

mwillsey commented 5 months ago

This add support for custom cost function. You can write code like this now:


(datatype E
  (Foo String :cost FooCost))

(union (Foo "x") (Foo "y"))
(union (Foo "y") (Foo "z"))

(set (FooCost "x") 17)
(set (FooCost "y") 11)
(set (FooCost "z") 15)

(extract (Foo "y"))

I'm not 100% if what I did with typechecking is okay, but at least this simple example works.

mwillsey commented 5 months ago

This will solve Tyler Hou's issue on Zulip.

mwillsey commented 5 months ago

I like your idea Saul, but the reason that it is like (FooCost args) rather than (cost (Foo args)) is because the latter "loses precision" as e-class gen unioned, i.e., you can't maintain a handle on a specific e-node. The first approach builds the constructor in, so to speak, so it's always clear which e-node a cost is referring to. Perhaps a new node-based architecture would make this possible, but for now I think it's necessary to be the FooCost way.

saulshanabrook commented 5 months ago

Ah I see what you mean!

If I am understanding it correctly, the current implementation stores a cost associated with the tuple of e classes representing the args of the function. With my proposed change, it would store the cost with the eclass of the function return value instead.

It makes me think it's a bit similar to subsumption or deletion of a node? Which is based on identity of the args not the output.

What if we added a "cost" action that takes a call and an i64? It would be similar to subsume. When I implemented subsume, I made an enum with delete as two row level actions, where cost setting could be a third.

Would that have the same expressivity do you think?

yihozhang commented 5 months ago

On another note, it might be better to make :cost FooCost a parameter of extract than to datatype/constructor, so that extract is parametrized by the cost model used. This will be even nicer if we have a module system in the future, and we will be able to write something like (extract e :cost-model CostModelModuleA). The current downside is that since datatypes are open, it is not clear what parameters we should provide to extract. We can simply say if something is not in extract's parameters, it's by default unextractable.

On Sat, Mar 9, 2024 at 4:33 PM Saul Shanabrook @.***> wrote:

Ah I see what you mean!

If I am understanding it correctly, the current implementation stores a cost associated with the tuple of e classes representing the args of the function. With my proposed change, it would store the cost with the eclass of the function return value instead.

It makes me think it's a bit similar to subsumption or deletion of a node? Which is based on identity of the args not the output.

What if we added a "cost" action that takes a call and an i64? It would be similar to subsume. When I implemented subsume, I made an enum with delete as two row level actions, where cost setting could be a third.

Would that have the same expressivity do you think?

— Reply to this email directly, view it on GitHub https://github.com/egraphs-good/egglog/pull/353#issuecomment-1987024338, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADEKPFBXTEHKSP54HY3U2CLYXOS33AVCNFSM6AAAAABENUI6LCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBXGAZDIMZTHA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

mwillsey commented 5 months ago

I thought of that, and I agree. The difficulty is that it’s hard to do for datatypes, since the cost-table would have to be defined after the datatype defn to mention the sort, but then that’s after its use in the variant as well!

On Mon, Mar 11, 2024 at 3:59 PM Oliver Flatt @.***> wrote:

@.**** approved this pull request.

Great work on this PR! I approve. Would be great to have a more complex test though

In src/ast/parse.lalrpop https://github.com/egraphs-good/egglog/pull/353#discussion_r1520519279:

@@ -101,9 +101,12 @@ Schedule: Schedule = {

=> Schedule::Run(RunConfig { ruleset: ident, until: None }), }

-Cost: Option = {

  • ":cost" => Some(<>),
  • => None, +Cost: CostFn = {
  • ":cost" => CostFn::Constant(<>),
  • ":cost" => CostFn::Table(false, <>),

Perhaps it would be simpler to force the user to declare the table themself, then typecheck it properly? This is pretty good too but may be confusing.

— Reply to this email directly, view it on GitHub https://github.com/egraphs-good/egglog/pull/353#pullrequestreview-1929715972, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANTPTF6KAYBYSXNO4EUOM3YXZAP7AVCNFSM6AAAAABENUI6LCVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTSMRZG4YTKOJXGI . You are receiving this because you authored the thread.Message ID: @.***>

oflatt commented 5 months ago

Yeah, that makes sense

saulshanabrook commented 5 months ago

What if we added a "cost" action that takes a call and an i64? It would be similar to subsume. When I implemented subsume, I made an enum with delete as two row level actions, where cost setting could be a third.

I had a go at implementing it this way: https://github.com/egraphs-good/egglog/pull/355

Let me know what you think. It seems a bit cleaner to me and I like how it let's users to override the cost for any functions, not just those with a custom cost fn defined.

saulshanabrook commented 4 months ago

Closing this for now as well, since we wanted to refine the design space a bit to understand the constraints before committing to a solution I believe.

Feel free to re-open when desired.