Note that I highlighted two fields as new, these fields were added and were attempted to rectify via a merge. The error discussed below occured when calling write_deltalake with the rust engine and schema_mode=merge.
Due to concerns around memory usage, we use the pyarrow engine on writes (we're still looking to switch this over to the rust engine entirely). However, if we get an error saying the schema data does not match whats in the table, we will fallback to a rust write with schema_mode=merge. So maybe that info is useful, that we use pyarrow first and then the rust engine? But we've merged several times without issues in the past.
There were two fields added as mentioned above. After this update, this following error occurred
Cast error: Cannot cast string 'resultId value' to value of Int64 type. The confusing part is that this error was happening on the resultId field, which already existed.
What you expected to happen:
I wouldn't have expected an error, but if it did occur, on one of the two new fields that got added and not an existing field where the type didn't change.
How to reproduce it:
TBD, going to try and reproduce this locally.
EDIT: no luck yet 🤷
More details:
I was able to solve this by opening a PySpark session and running an ALTER TABLE to add the columns.
^ edited to add some new details. I've attempted to do this locally (with the pyarrow write first, then the changes to the schema, and a rust merge write) but so far it's worked locally as expected.
Environment
Delta-rs version: v0.16.4
Binding: python
Environment:
Bug
What happened:
We had a delta table with a schema that looked something like this (some names omitted due to data privacy, also excuse the poor indenting):
Note that I highlighted two fields as new, these fields were added and were attempted to rectify via a merge. The error discussed below occured when calling
write_deltalake
with therust
engine andschema_mode=merge
.Due to concerns around memory usage, we use the
pyarrow
engine on writes (we're still looking to switch this over to the rust engine entirely). However, if we get an error saying the schema data does not match whats in the table, we will fallback to arust
write withschema_mode=merge
. So maybe that info is useful, that we use pyarrow first and then the rust engine? But we've merged several times without issues in the past.There were two fields added as mentioned above. After this update, this following error occurred
Cast error: Cannot cast string 'resultId value' to value of Int64 type
. The confusing part is that this error was happening on theresultId
field, which already existed.What you expected to happen: I wouldn't have expected an error, but if it did occur, on one of the two new fields that got added and not an existing field where the type didn't change.
How to reproduce it:
TBD, going to try and reproduce this locally.
EDIT: no luck yet 🤷
More details:
I was able to solve this by opening a PySpark session and running an
ALTER TABLE
to add the columns.