Open asfimport opened 3 years ago
Eduardo Ponce / @edponce: EqualOptions has a nans_equal member to control the behavior of comparisons between NaNs. I assume this was included to satisfy the behavior of different tools.
IEEE 754 states that all logical operations with a NaN should always result in false, except for NaN != x. For a discussion on this topic, refer to the first answer of this stack overflow question and wikipedia NaN page.
My opinion for Arrow is that all logical comparisons with a NaN value should return false except for:
NaN == NaN
this would eliminate the use of the nans_equal option.
Eduardo Ponce / @edponce: R uses NA to represent a missing value, equivalent to having a NULL bit set in Arrow.
Coercing NaN to logical or integer type gives an NA of the appropriate type, but coercion to character gives the string "NaN". NaN values are incomparable so tests of equality or collation involving NaN will result in NA.
w.r.t. R's behavior for
> NaN > 5
[1] NA
it does not seems to conform strictly to IEEE 754. My speculation is that internally the result is NaN but when coerced as a logical type becomes NA.
Joris Van den Bossche / @jorisvandenbossche:
EqualOptions has a nans_equal member to control the behavior of comparisons between NaNs. I assume this was included to satisfy the behavior of different tools.
Note this is for a different operation: for a "full array, data-structure equality" (arr1.equals(arr2) = True or False
), and the option is added here mainly for convenience (as often you want to regard NaNs in the same location as equal when it comes to full array equality, and writing this out manually is rather verbose, i.e. something like ((a == b) | (a.isnan() & b.isnan()).all()
).
We don't have such an option for element-wise comparisons (which is the type of equality/comparison that is discussed in this issue)
In working on ARROW-12964 we ran into some corner behaviors with
NaN
that don't match our (and R's) expectations. It appears that (any?) comparison withNaN
results in false:though at least in R this would result in an NA value:
The current behavior does match numpy's behavior:
Reporter: Jonathan Keane / @jonkeane
Related issues:
Note: This issue was originally created as ARROW-13364. Please see the migration documentation for further details.