Closed aosingh closed 2 weeks ago
I'm glad this is useful!
We're about to release the current bindings and unfortunately I don't think we can get out-of-the-box support for this before the release (there will probably be another release in early July).
Getting the details right for the full matrix of Arrow date/time/datetime vs. Python date/time/datetime objects s hard; however, if you have full control over the datetime objects you are producing, the workaround is fairly compact (below).
import datetime
import nanoarrow as na
def gen_name():
for i in range(10):
yield "John Doe"
def gen_age():
for i in range(10):
yield 34
def gen_timestamp():
for i in range(10):
yield datetime.datetime.now().timestamp()
def to_arrow():
# Declare schema with the actual arrow type (timestamp)
schema = na.struct(
{"name": na.string(), "age": na.int64(), "timestamp": na.timestamp("ms")}
)
# Create column as an int64 array with storage values
columns = [
na.c_array(gen_name(), na.string()),
na.c_array(gen_age(), na.int64()),
na.c_array((int(t * 1e3) for t in gen_timestamp()), na.int64()),
]
# Skip validation when creating from buffers
return na.c_array_from_buffers(
schema,
length=columns[0].length,
buffers=[None],
children=columns,
validation_level="none",
)
na.Array(to_arrow())
If you'd like to help add support, one way would be to add a method to the ArrayFromIterableBuilder
:
then add a line in the mapping that you linked with a mapping from CArrowType.TIMESTAMP
to the name of the method you added:
Getting the details right with repsect to timezones and units is hard, but is essentially a reverse-engineered version of the conversion in the other direction:
Thank you, this is helpful.
Let me think through the details for Python datetime/date support and the test cases.
Closed in #478!
Thanks to the Arrow community for developing this lightweight wrapper.
I am planning to add support for Apache Arrow in one of the projects I am working on. The aim is to leverage nanoarrow to support exporting tabular data in arrow format.
Users will have access to a function
to_arrow()
:Users of the library can optionally install
pyarrow
andpandas
to work with the exported data. And the export works fine!Adding a third field
timestamp
to the above list raises an error:Error:
I understand the source of error is the mapping maintained for each datatype.
How can I add support to incrementally build arrays for more datatypes ?