apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.17k stars 3.46k forks source link

[C++][Arrow][Parquet] Are there any plans to support decimal NaN? #43205

Open huberylee opened 1 month ago

huberylee commented 1 month ago

Describe the enhancement requested

Currently, there is no definitive standard for decimal NaN encoding in the industry, and Arrow does not support decimal NaN values. Is it possible to support decimal NaN values in Arrow, including the Arrow read/write Parquet path and the decimal cast path?

Component(s)

C++

zeroshade commented 1 month ago

Currently I believe the issue ultimately is that there's no current standard for how you could encode an NaN for decimal128/256 values. In most cases though, it's not technically necessary for Arrow to explicitly support it as the values are carried as-is throughout a system. Thus a particular system/group of systems/organization could accept a particular bit-pattern for decimal128/256 to indicate NaN and it should pass through any Arrow-based or Parquet-based libraries unmolested.

The only place where it would matter is in the functions for converting to/from float/double explicitly

huberylee commented 1 month ago

Currently I believe the issue ultimately is that there's no current standard for how you could encode an NaN for decimal128/256 values. In most cases though, it's not technically necessary for Arrow to explicitly support it as the values are carried as-is throughout a system. Thus a particular system/group of systems/organization could accept a particular bit-pattern for decimal128/256 to indicate NaN and it should pass through any Arrow-based or Parquet-based libraries unmolested.

The only place where it would matter is in the functions for converting to/from float/double explicitly

Is it possible to provide some extension interface somewhere like the Cast function, where the user can decide for themselves whether to convert a specific Decimal value to NaN, or vice versa?

mapleFU commented 1 month ago

Is there any example that decimal NaN? I've a quick check in snowflake and it doesn't support that

Values of infinity and NaN (not-a-number) result in conversion errors.

https://docs.snowflake.com/en/sql-reference/functions/to_decimal

It would be interesting to hear how other system supports this

huberylee commented 1 month ago

PostgreSQL supports decimal NaN by using a sign bit in its internal decimal format to mark decimal NaN values.

In addition to ordinary numeric values, the numeric type has several special values: Infinity -Infinity NaN

https://www.postgresql.org/docs/current/datatype-numeric.html#DATATYPE-NUMERIC-DECIMAL

We can use a similar method as with floating point types to represent NaN values by using an unused value at a specific precision. For details, refer to: https://www3.ntu.edu.sg/home/ehchua/programming/java/datarepresentation.html