apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.43k stars 3.51k forks source link

[Python] Can't convert datetime.date to pa.timestamp in pa.array #36277

Open danepitkin opened 1 year ago

danepitkin commented 1 year ago

Describe the bug, including details regarding any error messages, version, and platform.

Array does not allow this conversion, but Table does.

>>> import pyarrow as pa
>>> import datetime

>>> pa.table([pa.array([datetime.date(2000, 1, 1)])], schema=pa.schema([pa.field('date', pa.timestamp('s'))]))
pyarrow.Table
date: timestamp[s]
----
date: [[2000-01-01 00:00:00]]

>>> pa.array([datetime.date(2000, 1, 1)], type=pa.timestamp('s'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pyarrow/array.pxi", line 327, in pyarrow.lib.array
    result = _sequence_to_array(obj, mask, size, type, pool, c_from_pandas)
  File "pyarrow/array.pxi", line 39, in pyarrow.lib._sequence_to_array
    chunked = GetResultValue(
  File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
    return check_status(status)
  File "pyarrow/error.pxi", line 123, in pyarrow.lib.check_status
    raise ArrowTypeError(message)
pyarrow.lib.ArrowTypeError: object of type <class 'datetime.date'> cannot be converted to int

Component(s)

Python

danepitkin commented 1 year ago

datetime.datetime does work though:

>>> pa.array([datetime.datetime(2000, 1, 1)], type=pa.timestamp('s'))
<pyarrow.lib.TimestampArray object at 0x13999bee0>
[
  2000-01-01 00:00:00
]
danepitkin commented 1 year ago

Using the cast compute kernel works.

>>> pa.array([datetime.date(2000, 1, 1)]).cast(pa.timestamp('s'))
<pyarrow.lib.TimestampArray object at 0x1356cfe80>
[
  2000-01-01 00:00:00
]

This should be a duplicate of another issue somewhere.

NMAC427 commented 1 year ago

This can also be replicated with pandas:

pd.Series([dt.date(1970, 1, 1)], dtype=pd.ArrowDtype(pa.timestamp("ms")))

---------------------------------------------------------------------------
ArrowTypeError                            Traceback (most recent call last)
Cell In[14], line 1
----> 1 pd.Series([dt.date(1970, 1, 1)], dtype=pd.ArrowDtype(pa.timestamp("ms")))

File ~/Library/Caches/pypoetry/virtualenvs/pydiverse-pipedag-JBY4b-V4-py3.11/lib/python3.11/site-packages/pandas/core/series.py:509, in Series.__init__(self, data, index, dtype, name, copy, fastpath)
    507         data = data.copy()
    508 else:
--> 509     data = sanitize_array(data, index, dtype, copy)
    511     manager = get_option("mode.data_manager")
    512     if manager == "block":

File ~/Library/Caches/pypoetry/virtualenvs/pydiverse-pipedag-JBY4b-V4-py3.11/lib/python3.11/site-packages/pandas/core/construction.py:559, in sanitize_array(data, index, dtype, copy, allow_2d)
    557     _sanitize_non_ordered(data)
    558     cls = dtype.construct_array_type()
--> 559     subarr = cls._from_sequence(data, dtype=dtype, copy=copy)
    561 # GH#846
    562 elif isinstance(data, np.ndarray):

File ~/Library/Caches/pypoetry/virtualenvs/pydiverse-pipedag-JBY4b-V4-py3.11/lib/python3.11/site-packages/pandas/core/arrays/arrow/array.py:270, in ArrowExtensionArray._from_sequence(cls, scalars, dtype, copy)
    268     scalars = deepcopy(scalars)
    269 try:
--> 270     scalars = pa.array(scalars, type=pa_dtype, from_pandas=True)
    271 except pa.ArrowInvalid:
    272     # GH50430: let pyarrow infer type, then cast
    273     scalars = pa.array(scalars, from_pandas=True)

File ~/Library/Caches/pypoetry/virtualenvs/pydiverse-pipedag-JBY4b-V4-py3.11/lib/python3.11/site-packages/pyarrow/array.pxi:327, in pyarrow.lib.array()

File ~/Library/Caches/pypoetry/virtualenvs/pydiverse-pipedag-JBY4b-V4-py3.11/lib/python3.11/site-packages/pyarrow/array.pxi:39, in pyarrow.lib._sequence_to_array()

File ~/Library/Caches/pypoetry/virtualenvs/pydiverse-pipedag-JBY4b-V4-py3.11/lib/python3.11/site-packages/pyarrow/error.pxi:144, in pyarrow.lib.pyarrow_internal_check_status()

File ~/Library/Caches/pypoetry/virtualenvs/pydiverse-pipedag-JBY4b-V4-py3.11/lib/python3.11/site-packages/pyarrow/error.pxi:123, in pyarrow.lib.check_status()

ArrowTypeError: object of type <class 'datetime.date'> cannot be converted to int

However, initializing the Series as a date32 and then casting to timestamp works:

pd.Series([dt.date(1970, 1, 1)], dtype=pd.ArrowDtype(pa.date32())).astype(pd.ArrowDtype(pa.timestamp("ms")))