Open asfimport opened 5 years ago
Antoine Pitrou / @pitrou:
ARROW-7011 will allow making this by calling the Array.cast()
method. Is it enough for the use case?
Joris Van den Bossche / @jorisvandenbossche: That certainly solves the immediate need/functionality for being able to convert floats to decimal type in Arrow.
I would personally say that it would still be nice to be able to do this already upon conversion to Arrow in pa.array
(which would also ensure it works when converting eg a pandas DataFrame with a float column to a pyarrow Table with a given pyarrow schema).
But I suppose that once Decimal128::FromReal
is added, it should also be possible to use this in the python_to_arrow.cc? (meaning, we could leave the issue open as a possible future enhancement, if we want this)
Joris Van den Bossche / @jorisvandenbossche: Actually, also strings are not accepted (while internally python decimal objects are converted to strings first to convert them to decimal type) :
In [12]: pa.array(["0.1", "0.2"], pa.decimal128(2, 1))
...
ArrowTypeError: int or Decimal object expected, got str
(and casting strings to decimal doesn't work yet, that's probably worth another JIRA?)
So it's maybe a more general question: what types of values do we want to accept to construct a decimal array? Now we accept Python decimal.Decimal objects, but also ints, so why not floats or strings? After ARROW-7011, I think it would be a relatively easy addition to also accept also those types in DecimalFromPyObject
(https://github.com/apache/arrow/blob/bcbb3e2c350b3889c19b3c3fdbb0a88d5c8f1cbd/cpp/src/arrow/python/decimal.cc#L148-L164).
One disadvantage might be that the object-by-object conversion in the DecimalConverter
(involving Python) might be less efficient than a cast in case of a typed float array as input.
Antoine Pitrou / @pitrou: That sounds reasonable to me, yes.
We currently allow constructing a decimal array from decimal.Decimal objects or from ints:
but not from floats (or strings):
Is this something we would like to support?
There are for sure precision issues you run into, but if the decimal type is fully specified, it seems clear what the user wants. In general, since decimal objects in pandas are not that easy to work with, many people might have plain float columns that they want to convert to decimal.
Reporter: Joris Van den Bossche / @jorisvandenbossche
Related issues:
Note: This issue was originally created as ARROW-5905. Please see the migration documentation for further details.