egraphs-good / egglog

egraphs + datalog!
https://egraphs-good.github.io/egglog/
MIT License
459 stars 54 forks source link

"subsume" doesn't fail but also doesn't work #357

Closed rix0rrr closed 1 month ago

rix0rrr commented 8 months ago

According to the documentation of (rewrite ... :subsume) it desugars into a rule containing a subsume command, explaining that subsume prevents the same form from triggering any other rules.

However, when I try the following, it doesn't do what I expect:

(datatype Player
  (player i64))

(relation match ())
(relation contender (Player))
(relation winner (Player))

(match)
(contender (player 1))
(contender (player 2))
(contender (player 3))

; Match an arbitrary contender, declare them winner.
; There is only one (match), (subsume) should prevent this from matching more
; than once.
(rule ((match)
       (contender P))
      ((winner P)
       (subsume (match))))

(run 10)
(print-function winner 10)

Prints:

(
   (winner (player 2))
   (winner (player 1))
   (winner (player 3))
)

I would have expected (subsume (match)) to make sure the (match) cannot trigger any more rules after the first; yet apparently it does.

In most examples, this is written as (rule ((= lhs (...form...))), but rewriting the rule to this doesn't make a difference:

(rule ((= lhs (match))
       (contender P))
      ((winner P)
       (subsume (match))))

What am I misunderstanding about subsume ?

saulshanabrook commented 8 months ago

AFAIK you are understanding this correctly! It seems like a bug to me. I can look into it.

Thanks for the detailed write-up and explanation.

saulshanabrook commented 8 months ago

Ah, I see now the issue. The problem is that in each round, egglog will first find all matches, and then execute all the rules bodies. So when (contender P) is matched, (match) hasn't been subsumed yet, for all contenders. They will all be matched in parallel.

So then subsume will be executed multiple times. You can see this if you change it to just (run 1), it will have the same result. You can also see it if you add the other values after running, and it will work as expected:

(datatype Player
  (player i64))

(relation match ())
(relation contender (Player))
(relation winner (Player))

(match)
(contender (player 1))

; Match an arbitrary contender, declare them winner.
; There is only one (match), (subsume) should prevent this from matching more
; than once.
(rule ((match)
       (contender P))
      ((winner P)
       (subsume (match))))

(run 1)
(contender (player 2))
(contender (player 3))
(run 10)
(print-function winner 10)

This isn't just a problem with subsumption, but any action that is dependent on when it fires, like using set or functions with custom merge functions.

I came up with a workaround for this sort of issue when generating imperative code from an egraph, let me see if I can come up with a similar one for your use case here.

rix0rrr commented 8 months ago

Is it desirable behavior for subsume to behave this way though?

It feels there's some nondeterminism in there that feels undesirable. The documentation should say something like "prevent this form from matching, unless it has already matched in parallel with the current rule", which is going to depend on a lot of factors that aren't necessarily under your control.

oflatt commented 8 months ago

This is desirable behavior for subsume, since it should only be used when a particular term has been proven to subsume another. However this is a good point, the documentation should emphasize that it is easy to use subsume in a bad way. It should also give a more precise definition of when it is safe to use it.

rix0rrr commented 8 months ago

Is a correct summarization that subsume is only intended to be used with union, and only as a performance optimization?

saulshanabrook commented 8 months ago

Yeah, my usage for subsume is to be able to do a rewrite that goes to a more complicated term and make sure that more complicated term is extracted instead of the simpler one, since subsuming a term also makes it unextractable. I plan to add subsume to these rewrites

I am not sure how others are using it and it's definitely in the experimentation phase of the best ways to use it.