Closed ldacey closed 1 month ago
A more minimal example of the issue:
import polars as pl
from deltalake import DeltaTable
df = pl.DataFrame(
{
"id": [1, 2],
"date": [1, 2],
},
schema={
# setting data types to be equal fixes the error, i.e. int & int or date & date
"id": pl.Int64,
"date": pl.Date,
},
)
table = df.to_arrow()
dt = DeltaTable.create(
table_uri="union_error",
schema=table.schema,
mode="overwrite",
partition_by=["id"], # taking out partitioning fixes the error
configuration={
"delta.enableChangeDataFeed": "true", # false fixes the error
},
)
dt.merge(
source=table,
predicate="s.id = t.id",
source_alias="s",
target_alias="t",
).when_not_matched_insert_all().execute()
Running this gives the error
Traceback (most recent call last):
File "/workspaces/codespaces-blank/setup.py", line 31, in <module>
).when_not_matched_insert_all().execute()
^^^^^^^^^
File "/usr/local/python/3.12.1/lib/python3.12/site-packages/deltalake/table.py", line 1793, in execute
metrics = self._table.merge_execute(self._builder)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_internal.DeltaError: Generic DeltaTable error: Error during planning: UNION Column id (type: Int64) is not compatible with column date (type: Date32)
Modifications to the script that allow it to run without error:
id
and date
to be the same; pl.Int64
OR pl.Date
.partition_by=["id"]
."delta.enableChangeDataFeed": "true"
.The script also works fine if you have two pl.String
columns, or one pl.String
and one pl.Int64
. But if you have pl.String
and pl.Date
columns, you then get the following kind of error:
Traceback (most recent call last):
File "/workspaces/codespaces-blank/setup.py", line 31, in <module>
).when_not_matched_insert_all().execute()
^^^^^^^^^
File "/usr/local/python/3.12.1/lib/python3.12/site-packages/deltalake/table.py", line 1793, in execute
metrics = self._table.merge_execute(self._builder)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Exception: Cast error: Cannot cast string 'a' to value of Date32 type
So the issue seems to be some weird mix of Date, partitioning, and CDF. 🤔
Looks like this is the same issue as #2832
Environment
Delta-rs version: 0.20.0
Binding: Python
Bug
What happened:
Enabling CDF results in
_internal.DeltaError: Generic DeltaTable error: Error during planning: UNION
What you expected to happen:
How to reproduce it:
More details:
Turn off the delta.enableChangeDataFeed configuration and then the merge is successful for some reason.