apache / datafusion-python

Apache DataFusion Python Bindings
https://datafusion.apache.org/python
Apache License 2.0
323 stars 64 forks source link

Coalesce seems broken when converting to pyarrow #534

Open cpcloud opened 7 months ago

cpcloud commented 7 months ago

Describe the bug

A SQL query with coalesce fails when converted to pyarrow

To Reproduce

In [1]: import datafusion as df

In [2]: ctx = df.SessionContext()

In [3]: ctx.sql
Out[3]: <function SessionContext.sql(query)>

In [4]: ctx.sql("select coalesce(null, 5)")
Out[4]:
DataFrame()
+-------------------------+
| coalesce(NULL,Int64(5)) |
+-------------------------+
| 5                       |
+-------------------------+

In [5]: ctx.sql("select coalesce(null, 5)").to_arrow_table()
---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
Cell In[5], line 1
----> 1 ctx.sql("select coalesce(null, 5)").to_arrow_table()

File /nix/store/nqcbgqab0slp4kx3ixk8225nwrzy5mbd-python3-3.10.13-env/lib/python3.10/site-packages/pyarrow/table.pxi:4057, in pyarrow.lib.Table.from_batches()

File /nix/store/nqcbgqab0slp4kx3ixk8225nwrzy5mbd-python3-3.10.13-env/lib/python3.10/site-packages/pyarrow/error.pxi:154, in pyarrow.lib.pyarrow_internal_check_status()

File /nix/store/nqcbgqab0slp4kx3ixk8225nwrzy5mbd-python3-3.10.13-env/lib/python3.10/site-packages/pyarrow/error.pxi:91, in pyarrow.lib.check_status()

ArrowInvalid: Schema at index 0 was different:
coalesce(NULL,Int64(5)): int64
vs
coalesce(NULL,Int64(5)): int64 not null

Expected behavior

I would expect the result to be a pyarrow table with 5 in it.