apache / datafusion-python

Apache DataFusion Python Bindings
https://datafusion.apache.org/python
Apache License 2.0
320 stars 63 forks source link

Boolean operators in expressions are ignored #667

Closed timsaucer closed 1 month ago

timsaucer commented 1 month ago

Describe the bug

When attempting to create and expression using operators like and and or, no errors are reported but the resultant operations do not operate as expected. It appears the first expression is evaluated and others are ignored.

To Reproduce This minimal code will reproduce the behavior:

ctx = SessionContext()

batch = pa.RecordBatch.from_arrays(
    [pa.array([1, 2, 3])],
    names=["a"],
)

df = ctx.create_dataframe([[batch]])

df.with_column("b", col("a") == lit(1) or col("a") == lit(3)).show()
df.with_column("b", col("a") == lit(3) or col("a") == lit(1)).show()

This generates the following results:

DataFrame()
+---+-------+
| a | b     |
+---+-------+
| 1 | true  |
| 2 | false |
| 3 | false |
+---+-------+
DataFrame()
+---+-------+
| a | b     |
+---+-------+
| 1 | false |
| 2 | false |
| 3 | true  |
+---+-------+

Expected behavior If these types of operations are not supported, an error should be generated. Even better would be to fully support these operations since it will mean a great deal for adoption across the python community.

Michael-J-Ward commented 1 month ago

TLDR: Use the bitwise operators & and |, which get mapped to the magic methods __and__ and __or__.

This is a python quirk see this table.

Basically, the evaluation mechanics make it impossible for x or y to create the combined expression you're looking for.

a_eq_1 = column("a") == literal(1)
a_eq_3 = column("a") == literal(3)

print("using `or`:",  a_eq_1 or a_eq_3)
print("using `and`:", a_eq_1 and a_eq_3)
print("using `|`:",  a_eq_1 | a_eq_3)
print("using `&`:", a_eq_1 & a_eq_3)
using `or`: Expr(a = Int64(1))
using `and`: Expr(a = Int64(3))
using `|`: Expr(a = Int64(1) OR a = Int64(3))
using `&`: Expr(a = Int64(1) AND a = Int64(3))
timsaucer commented 1 month ago

Thank you! I tested and your answer works as expected. I'll put up a PR this morning to expand the documentation so others don't come with the same question. I appreciate the rapid response.