Open tustvold opened 2 years ago
On a related note it is unable to simplify expressions of the form
A ^ (!A v B)
Related to this, I think the PoC Tokomak optimizer I created is especially suited for these kind of optimizations. https://github.com/Dandandan/datafusion-tokomak
@pjmore extended this optimizer to be able to optimize both expressions and logical plan nodes. See here for more: https://github.com/apache/arrow-datafusion/pull/1485
I'm not at all familiar with e-graphs, but I would perhaps naively expect something specialised to boolean algebra to significantly out-perform a method based on arbitrary expression rewriting? I guess I'm thinking something like BDDs, or even just recursively applying shannon expansion and relying on const propagation to fix the mess...
This isn't an area I'm hugely familiar with, but I can't help feeling it should just be the case of picking an off-the-shelf boolean expression minimizer and plugging it in :sweat_smile:
FWIW a quick google search found this - https://docs.rs/boolean_expression/latest/boolean_expression/index.html
I do wonder if https://docs.rs/boolean_expression/latest/boolean_expression/index.html handles Nulls
🤔 which is an important niche case for database / sql
FWIW (A ^ !A)
is NOT always false
because of .... 🥁 NULL
alamb=# select null AND (not null);
?column?
----------
(1 row)
We can do this rewrite
A ^ !A = false (iff A is not nullable)
Does something similar also hold for things like A AND false
? Just wondering if the existing logic handles this correctly... 🤔
Does something similar also hold for things like A AND false? Just wondering if the existing logic handles this correctly... 🤔
Yes. In fact @NGA-TRAN fixed a bug related to that in https://github.com/apache/arrow-datafusion/pull/1245 -- I think we have it sufficiently tested now.
See some of the rules here https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/optimizer/simplify_expressions.rs#L568
It actually turn out that A AND false
is always false (even if A
is NULL)
However, A OR false
could be NULL
or false
🤯
Aah yes, the SQL interpretation of NULL as "unknown" vs perhaps more intuitive notions of it... :sweat_smile:
null == null --> null
will never cease to make no sense to me...
It's depend on #2254
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I noticed Datafusion is not able to simplify the expression
Whilst it is unlikely for a user to write such an expression, it is not uncommon for boolean expressions to reduce to something unsatisfiable. Where this is the case, substituting in
false
may allow for further simplification.Describe the solution you'd like
It would be awesome if Datafusion was able to detect and eliminate unsatisfiable boolean expressions. I believe this is an NP-hard problem in the general case, but should be tractable for a low number of variables.