apache / arrow-nanoarrow

Helpers for Arrow C Data & Arrow C Stream interfaces
https://arrow.apache.org/nanoarrow
Apache License 2.0
149 stars 34 forks source link

feat(python) : Added support to create timestamp/date32/date64 Array from iterable #502

Closed aosingh closed 3 weeks ago

aosingh commented 1 month ago

Addresses https://github.com/apache/arrow-nanoarrow/issues/478

Currently, if you create a c_array from an iterable of timestamps an error is raised

Code:

from datetime import datetime

import nanoarrow as na
import pyarrow as pa

def gen_timestamp():
    for i in range(100):
        yield int(round(datetime.now().timestamp()))

timestamp_array = na.c_array(gen_timestamp(), na.timestamp("s"))

parray = pa.Table.from_arrays([timestamp_array], names=["timestamp"])

print(parray.to_pandas())

Error:

Traceback (most recent call last):
  File "/Users/as/nanoarrow_poc/.venv/lib/python3.12/site-packages/nanoarrow/c_array.py", line 128, in c_array
    builder = builder_cls(schema)
              ^^^^^^^^^^^^^^^^^^^
  File "/Users/as/nanoarrow_poc/.venv/lib/python3.12/site-packages/nanoarrow/c_array.py", line 457, in __init__
    raise ValueError(
ValueError: Can't build array of type timestamp from iterable

Expected output:

             timestamp
0  2024-05-31 18:43:10
1  2024-05-31 18:43:10
2  2024-05-31 18:43:10
3  2024-05-31 18:43:10
4  2024-05-31 18:43:10
..                 ...
95 2024-05-31 18:43:10
96 2024-05-31 18:43:10
97 2024-05-31 18:43:10
98 2024-05-31 18:43:10
99 2024-05-31 18:43:10

[100 rows x 1 columns]

Benefits

aosingh commented 1 month ago

Thank you, I will go through the review comments and also check why some unit tests failed

aosingh commented 1 month ago

@paleolimbot

Could you please help with the GitHub CI workflow ? I wanted to verify the latest changes and unit tests. Thanks.

paleolimbot commented 1 month ago

Thanks for the additional changes! I was out-of-office today but will take a look tomorrow morning 🙂

aosingh commented 1 month ago

There are a few style guide check failures and 1 unit test failed. I am checking this.

paleolimbot commented 1 month ago

Did you know about pre-commit run --all-files? (It should take care of the style checks for you)

aosingh commented 4 weeks ago

Thanks, pre-commit run --all-files formatted the code as per style guide. I have also modified a unit testcase to use UTC timezone in timestamps.

aosingh commented 3 weeks ago

@paleolimbot

Please review the updates. I have simplified unit tests. Hopefully, the test cases are clear now. Let me know your feedback.

aosingh commented 3 weeks ago

I have added CSchemaView.storage_buffer_format and updated unit tests. I will further review it tomorrow.

aosingh commented 3 weeks ago

@paleolimbot

I have updated the PR based on the review comments.

To summarize:

Please review and let me know your feedback

aosingh commented 3 weeks ago

@paleolimbot

Sure, no problem, thank you for your patience.

na.c_buffer([d1, d2, d3], na.timestamp("ms")) raises ValueError. The unit tests are modified to verify that.

with pytest.raises(ValueError):
     na.c_buffer([d1, d2, d3], na.timestamp("ms"))

is this not expected ?

paleolimbot commented 3 weeks ago

Sorry, I missed that! Thank you!

aosingh commented 3 weeks ago

@paleolimbot

I see some unit tests for Windows has failed after merge. Let me try to understand the reason