apache / datafusion-python

Apache DataFusion Python Bindings
https://datafusion.apache.org/python
Apache License 2.0
321 stars 63 forks source link

Panic when reading empty pyarrow.Table #575

Closed jwimberl closed 2 months ago

jwimberl commented 5 months ago

Describe the bug When trying to create a DataFrame from a pyarrow.Table object with a nonzero number of columns, but zero rows, I encounter a panic in src/context.rs:294.

To Reproduce

>>> import datafusion as df
>>> import pyarrow as pa
>>> ctx = df.SessionContext()
>>> import pandas as pd
>>> df = pd.DataFrame({'col': []})
>>> import pyarrow as pa
>>> emptyTable = pa.Table.from_pandas(df)
>>> emptyTable
pyarrow.Table
col: double
----
col: [[]]
>>> ctx.from_arrow_table(emptyTable)
thread '<unnamed>' panicked at src/context.rs:294:37:
index out of bounds: the len is 0 but the index is 0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
pyo3_runtime.PanicException: index out of bounds: the len is 0 but the index is 0

Expected behavior I expect this to create a DataFrame with zero rows, such as the following (created via .limit(0) from a non-empty DataFrame):

>>> empty
DataFrame()
++
++
>>> empty.describe()
DataFrame()
+------------+-----+
| describe   | col |
+------------+-----+
| count      | 0.0 |
| null_count | 0.0 |
| mean       |     |
| std        |     |
| min        |     |
| max        |     |
| median     |     |
+------------+-----+

Additional context