Closed Dammi87 closed 1 year ago
Right now we expect users to cast their data types to ones Delta Lake supports. We may eventually support automatically casting in the future. That's tracked by https://github.com/delta-io/delta-rs/issues/686
Gotcha thanks!
I was aware of the limitation but the only unsupported data-type I was encountering was this damn timestamp, so I hoped that the file_options would save me the work :)
Should I close the issue then?
Yeah sorry those truncation options don't work for that. I think we'd like to fold this into the general issue for mapping data types though, rather than treat timestamps specially.
No worries, you guys are doing awesome work, much appreciated
Gotcha thanks!
I was aware of the limitation but the only unsupported data-type I was encountering was this damn timestamp, so I hoped that the file_options would save me the work :)
Should I close the issue then?
I am getting the same error, but I did not follow what the fix is, can you please clarify? thanks!
original_value = "2024-03-11T14:31:32.804589Z"
I converted it to datetime.fromisoformat(original_value)
I am using this as a column in pandas daatframe and when i print the datatype it shows datetime64[ns, UTC]
Also, I am building pyarrow schema from this pandas dataframe and pass it to the write_deltalake function. When I print the datatype from pyarrow it shows timestamp[ns, tz=UTC]
I have tried truncating the seconds altogether before creating the pandas dataframe, but to no avail.
Environment
Windows 10 Python 3.10.11
Delta-rs version: deltalake 0.9.0 pyarrow 12.0.0 numpy 1.24.3
Bug
What happened: I'm receiving json data from a service which is using nanosecond resolution which I need to store in delta format. It's acceptable to have truncated timestamps so I intended to simply allow that and coerce the timestamps to microsecond resolution. However, I end up with this error
PyDeltaTableError: Schema error: Invalid data type for Delta Lake: Timestamp(Nanosecond, Some("UTC"))
What you expected to happen: I expected the timestamp to be truncated and converted to microseconds.
How to reproduce it:
More details: This is a minimal producible example from the pipeline I'm creating - receiving a stream of json arrays