Open davidsteinar opened 1 month ago
Hello @davidsteinar, thanks for the report. PyArrow doesn't support this at the moment but there's already an issue to track that work: https://github.com/apache/arrow/issues/43855.
As a workaround until then, you can call .bytes
on your uuid.UUID
objects and then PyArrow will infer the column type as binary:
In [1]: import pandas as pd
...: import uuid
...: import pyarrow as pa
...:
...: # Create a DataFrame with UUID objects
...: data = {'MUID': [uuid.uuid4().bytes for _ in range(5)], <---- Note: .bytes called on each
...: 'Data': range(5)}
...:
...: df = pd.DataFrame(data)
...:
...: # Convert the DataFrame to an Arrow table
...: pa.Table.from_pandas(df)
Out[1]:
pyarrow.Table
MUID: binary
Data: int64
----
MUID: [[D3C9E28D1AF14833A765F3389F6E9CEF,0F75A9DAFECF438692840042DEDD4B7F,C25544BA12DD4EBC8701FD1178A502B7,CE2CDBF58BA4454CAED6BF0F54886BC4,E4866E83863240EF81B68129B5BB186D]]
Data: [[0,1,2,3,4]]
Describe the bug, including details regarding any error messages, version, and platform.
See reproducible example:
Component(s)
Python