Open ryanaston opened 7 months ago
Decimals are rounded when casting between scales, so one option might be to parse using 1 higher scale and then cast. This might be simpler than supporting rounding whilst parsing, which is likely to be complex to achieve without regressing performance.
Instead of erroring on scientific notation the value was being treated as null
Correct, if CastOptions::safe is set to true (the default) values that fail to parse will be treated as nulls
Decimals are rounded when casting between scales, so one option might be to parse using 1 higher scale and then cast. This might be simpler than supporting rounding whilst parsing, which is likely to be complex to achieve without regressing performance.
parse_decimal
already looks at every character in the string, even those beyond the desired scale to make sure they are valid characters for a decimal. Could it simply examine the character at scale + 1 and do an add_wrapping (for positives) or a sub_wrapping (for negatives)? Rounding is already present in parse_string_to_decimal_native
and does it this way. Why would rounding be a performance concern in one but not the other?
Instead of erroring on scientific notation the value was being treated as null
Correct, if CastOptions::safe is set to true (the default) values that fail to parse will be treated as nulls
Ah, missed that. Thank you!
Could it simply examine the character at scale + 1 and do an add_wrapping (for positives) or a sub_wrapping (for negatives)
My memory is hazy, but I seem to remember the current logic eliminates the need for overflow/underflow detection, or precision verification. Provided we can show we aren't massively regressing performance of decimal parsing, I don't see an issue with changing this behaviour as you describe.
Is your feature request related to a problem or challenge? Please describe what you are trying to do. There are currently two different functions in arrow-cast which convert strings to decimals.
parse_string_to_decimal_native
is used by arrow-cast when casting an entire array from a string type to a decimal type.parse_decimal
is used by arrow-csv and arrow-json. The inputs differ a bit, but the return types are identical. The latter accepts an additional precision argument, however when the former is used it is followed up by a call tovalidate_decimal_precision
for the desired type. The former rounds values when input exceeds scale while the latter truncates. Neither function supports parsing scientific notation.Describe the solution you'd like
Describe alternatives you've considered N/A
Additional context I stumbled upon this when using arrow-json reader to build a record batch. I was getting errors when my decimal values contained scientific notation and noticed
parse_decimal
doesn't support it, despite scientific notation being valid for JSON numbers. When I tried forcing expanded notation I found the excess scale was being truncated instead of rounded. I then tried building my column using a string array and casting that to a decimal type. Instead of erroring on scientific notation the value was being treated asnull
. Expanded values did work and were rounded.