apache / iceberg-python

Apache PyIceberg
https://py.iceberg.apache.org/
Apache License 2.0
309 stars 114 forks source link

Cannot cast a datetime type with a timezone into a timestampz type. #863

Open dongsupkim-onepredict opened 6 days ago

dongsupkim-onepredict commented 6 days ago

Apache Iceberg version

0.6.1 (latest release)

Please describe the bug 🐞

It seems that when inserting a pyarrow table containing UTC information created using the pendulum library into an iceberg table, an error occurs due to the inability to convert to the timestampz type.

import pendulum
import pyarrow as pa
import numpy as np

test_data_with_tz = {
    "date_time": [pendulum.now() for _ in range(nums)], #with timezone
    "int_data": [i for i in range(nums)],
    "str_data": ["v" for _ in range(nums)],
    "struct_data": [{"v_a": np.random.rand(1000), "v_b": np.random.rand(1000) } for i in range(nums)],
}
pa_table_with_tz = pa.Table.from_pydict(test_data_with_tz)

catalog.create_table(identifier=("test", "test"), schema=pa_table_with_tz.schema)

TypeError: Unsupported type: timestamp[us, tz=Etc/UTC]

kevinjqliu commented 5 days ago

Thanks for reporting! There's a related thread around this at #541. And a WIP fix at #848. cc @syun64 who's working on this

syun64 commented 5 days ago

I'm excited to take a look into this. I'll hopefully find some time to work on this tomorrow. thanks @dongsupkim-onepredict for reporting the issue