import pandas as pd
import pyarrow as pa
ts = pandas.Timestamp('2024-01-01 12:00:00+0000', tz = 'Europe/Paris')
# unnested, we get a timezone-aware result
pa.Array.from_pandas([myts]).to_pandas()[0]
# => Timestamp('2024-01-01 13:00:00+0100', tz='Europe/Paris')
# nested, we get a timezone-naive result
pa.Array.from_pandas([[myts]]).to_pandas()[0][0]
# => numpy.datetime64('2024-01-01T12:00:00.000000')
The reason for this is explained the comments of https://github.com/apache/arrow/issues/41162 and the upshot is of that is that we may not change the behavior at the moment. Therefore, I think it would be good to at least document the current behavior, including what workarounds may exist.
Describe the enhancement requested
In https://github.com/apache/arrow/issues/41162 it was reported that PyArrow's
to_pandas
method silently drops timezone information from nested Timestamp arrays. For example,The reason for this is explained the comments of https://github.com/apache/arrow/issues/41162 and the upshot is of that is that we may not change the behavior at the moment. Therefore, I think it would be good to at least document the current behavior, including what workarounds may exist.
Component(s)
Documentation, Python