Closed sugibuchi closed 4 months ago
Same issue: https://github.com/delta-io/delta-rs/issues/1598
i have experienced the same issue now trying to read a delta-rs generated table via Sql Server 2022 Polybase. Polybase Delta expects a timestamp with tz to be readable.
i have experienced the same issue now trying to read a delta-rs generated table via Sql Server 2022 Polybase. Polybase Delta expects a timestamp with tz to be readable.
Can Polybase not read non-tz timestamps?
No, doesn't work. I have also checked the other way around . A sql server CTAS using a datetime2 is always written as timestamp[us] with UTC tz
Environment
Delta-rs version:
Binding:
Environment:
Bug
We cannot append data to existing Delta Lake tables if the schema of data to write includes timestamp columns with timezone.
What happened: The first write succeeds. But subsequent append writes fail.
What you expected to happen:
We can append data including timestamp columns with timezone in its schema.
How to reproduce it:
pa.timestamp(unit="us", tz=timezone.utc)
looks compliant with the timestamp data type in Delta Lake.But the second
write_deltalake(..., mode="append")
fails with the following error.More details:
One of the possible workarounds is removing timezone from timestamp column definitions.
However, we are strongly concerned with this workaround because this workaround removes timestamp info from statistics in transaction logs.
We are currently investigating an inconsistent behaviour of Spark Delta Lake with one of our Delta Lake tables. Since this table is written using this workaround, and this inconsistency happens only when we set a timezone except for UTC to Spark session, we are guessing statistics without timezone information in transaction logs are the root cause of this inconsistency.