apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.56k stars 3.54k forks source link

[Python] Error thrown when multiplying decimal numbers #43252

Open dhirschfeld opened 3 months ago

dhirschfeld commented 3 months ago

Describe the bug, including details regarding any error messages, version, and platform.

MCVE

>>> import pyarrow as pa
>>> import pyarrow.compute
>>> import numpy as np
>>> pi = pa.compute.cast(np.pi, pa.decimal128(38, 10))
>>> e = pa.compute.cast(np.e, pa.decimal128(38, 10))
>>> pa.compute.multiply(pi, e)

---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)

      3 pi = pa.compute.cast(np.pi, pa.decimal128(38, 10))
      4 e = pa.compute.cast(np.e, pa.decimal128(38, 10))
----> 5 pa.compute.multiply(pi, e)

File /opt/python/envs/dev310/lib/python3.10/site-packages/pyarrow/compute.py:246, in _make_generic_wrapper.<locals>.wrapper(memory_pool, *args)
    244 if args and isinstance(args[0], Expression):
    245     return Expression._call(func_name, list(args))
--> 246 return func.call(args, None, memory_pool)

File /opt/python/envs/dev310/lib/python3.10/site-packages/pyarrow/_compute.pyx:385, in pyarrow._compute.Function.call()

File /opt/python/envs/dev310/lib/python3.10/site-packages/pyarrow/error.pxi:154, in pyarrow.lib.pyarrow_internal_check_status()

File /opt/python/envs/dev310/lib/python3.10/site-packages/pyarrow/error.pxi:91, in pyarrow.lib.check_status()

ArrowInvalid: Decimal precision out of range [1, 38]: 77
python: 3.10.14
pyarrow: 16.1.0
platform: linux-64 (ubuntu)

Component(s)

Python

dhirschfeld commented 3 months ago

This error was observed using pandas with pyarrow dtypes.

Data was returned from a (databricks) database with columns of type decimal128(38, 10). Trying to multiply two columns raised the above ArrowInvalid error.

The above MCVE demonstrates the issue using just pyarrow.

dhirschfeld commented 3 months ago

Possibly related:

khwilson commented 2 months ago

I'm not a maintainer but I was curious about this issue. A decimal128(38, 10) times a decimal128(38, 10) should be, at worst, a decimal256(76, 20). It does seem like a bug that it thinks the precision should be 77 and not 76.