data-apis / dataframe-api

RFC document, tooling and other content related to the dataframe API standard
https://data-apis.org/dataframe-api/draft/index.html
MIT License
98 stars 20 forks source link

Duration/timedelta not supported by dataframe interchange protocol? #329

Open MarcoGorelli opened 9 months ago

MarcoGorelli commented 9 months ago

Looks like timedeltas are currently not supported by the dataframe interchange protocol:

In [1]: pd.api.interchange.from_dataframe(pl.DataFrame({'a': [timedelta(1)]}))
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Cell In[1], line 1
----> 1 pd.api.interchange.from_dataframe(pl.DataFrame({'a': [timedelta(1)]}))

File ~/tmp/.venv/lib/python3.10/site-packages/pandas/core/interchange/from_dataframe.py:71, in from_dataframe(df, allow_copy)
     68 if not hasattr(df, "__dataframe__"):
     69     raise ValueError("`df` does not support __dataframe__")
---> 71 return _from_dataframe(
     72     df.__dataframe__(allow_copy=allow_copy), allow_copy=allow_copy
     73 )

File ~/tmp/.venv/lib/python3.10/site-packages/pandas/core/interchange/from_dataframe.py:94, in _from_dataframe(df, allow_copy)
     92 pandas_dfs = []
     93 for chunk in df.get_chunks():
---> 94     pandas_df = protocol_df_chunk_to_pandas(chunk)
     95     pandas_dfs.append(pandas_df)
     97 if not allow_copy and len(pandas_dfs) > 1:

File ~/tmp/.venv/lib/python3.10/site-packages/pandas/core/interchange/from_dataframe.py:150, in protocol_df_chunk_to_pandas(df)
    148     columns[name], buf = string_column_to_ndarray(col)
    149 elif dtype == DtypeKind.DATETIME:
--> 150     columns[name], buf = datetime_column_to_ndarray(col)
    151 else:
    152     raise NotImplementedError(f"Data type {dtype} not handled yet")

File ~/tmp/.venv/lib/python3.10/site-packages/pandas/core/interchange/from_dataframe.py:395, in datetime_column_to_ndarray(col)
    381 # Consider dtype being `uint` to get number of units passed since the 01.01.1970
    383 data = buffer_to_ndarray(
    384     dbuf,
    385     (
   (...)
    392     length=col.size(),
    393 )
--> 395 data = parse_datetime_format_str(format_str, data)  # type: ignore[assignment]
    396 data = set_nulls(data, col, buffers["validity"])
    397 return data, buffers

File ~/tmp/.venv/lib/python3.10/site-packages/pandas/core/interchange/from_dataframe.py:360, in parse_datetime_format_str(format_str, data)
    357         raise NotImplementedError(f"Date unit is not supported: {unit}")
    358     return data
--> 360 raise NotImplementedError(f"DateTime kind is not supported: {format_str}")

NotImplementedError: DateTime kind is not supported: tDu

Should they be?

kkraus14 commented 9 months ago

I'm +1 in supporting them

WillAyd commented 7 months ago

At least between pandas and pyarrow there is some nuance to what these represent. Pandas solely has the timedelta type, but pyarrow has duration (for second and higher precision) and an interval type (for calendar-based shifting).