apache / arrow-nanoarrow

Helpers for Arrow C Data & Arrow C Stream interfaces
https://arrow.apache.org/nanoarrow
Apache License 2.0
151 stars 34 forks source link

feat(python): Support Decimal types in convert to Python #425

Closed paleolimbot closed 2 months ago

paleolimbot commented 3 months ago

I experimented with a few different methods here...I think this one has a good balance of speed and not messing with the global precision.

import pyarrow as pa
import decimal
from nanoarrow.iterator import iter_py

items = [decimal.Decimal("12.3450"), None, decimal.Decimal("1234567.3456")]
array = pa.array(items, pa.decimal128(11, 4))
list(iter_py(array))
#> [Decimal('12.3450'), None, Decimal('1234567.3456')]

This seems to be vaguely on par with pyarrow convert:

import pyarrow as pa
import decimal
import numpy as np
from nanoarrow.iterator import iter_py

floats = np.random.random(int(1e6))
items = [decimal.Decimal(item) for item in floats]
array = pa.array(items)

%timeit array.to_pylist()
#> 799 ms ± 6.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit list(iter_py(array))
#> 431 ms ± 8.65 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
jorisvandenbossche commented 2 months ago

Not clear if this started with this PR, but on main the merged commit from this PR is failing in the wheel building: for Python 3.12, running the tests is segfaulting (https://github.com/apache/arrow-nanoarrow/actions/runs/8689330033/job/23826851854)

paleolimbot commented 2 months ago

I think I have seen that one time before but I was confused because the backtrace seems to suggest that it happens while collecting the tests? (Or is that backtrace generally known to be unreliable?)