Describe the bugpip recently switched to installing datafusion with version string '35.0.0'. Compared to a previous installation of version '34.0.0', creating an external table from hive-partitioned parquet data following the [https://arrow.apache.org/datafusion/user-guide/sql/ddl.html](documented instructions) does not work. While all the partition columns show up as columns of the table, the columns from the parquet data themselves do not appear.
To Reproduce
# prepare fake data
import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd
data = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
table = pa.Table.from_pandas(data)
import os
os.mkdir("fake=0")
pq.write_table(table,"./fake=0/data.parquet")
# load into datafusion
import datafusion as df
ctx = df.SessionContext()
ctx.sql("""
CREATE EXTERNAL TABLE data
STORED AS PARQUET
PARTITIONED BY (fake)
LOCATION './*/data.parquet'
""")
Describe the bug
pip
recently switched to installing datafusion with version string'35.0.0'
. Compared to a previous installation of version'34.0.0'
, creating an external table from hive-partitioned parquet data following the [https://arrow.apache.org/datafusion/user-guide/sql/ddl.html](documented instructions) does not work. While all the partition columns show up as columns of the table, the columns from the parquet data themselves do not appear.To Reproduce
The loaded data is missing
col1
andcol2
:Expected behavior The same steps with DataFusion
34.0.0
produce the following output:Additional context Operating system: Rocky 8 Python version:
3.10.11
DataFusion version:35.0.0
, recently installed via pip pyarrow version:15.0.0