Open Wainberg opened 7 months ago
Reprex:
library(arrow, warn.conflicts = FALSE)
arrow_array("-2147483648")$cast(int32()) |> as.vector()
#> [1] NA
This conversion happens in one of a few places, depending on whether options(arrow.use_altrep = TRUE)
and how the caller of the R C API is consuming the ALTREP array (by INTEGER_ELT()
, INTEGER_GET_REGION()
, or DATAPTR_RO()
.
Checking for specific int32 values is potentially expensive (but safer) in the ALTREP scenario...technically there would be an identical problem with int64 conversions to R's integer64 class.
It looks like nanoarrow has an identical problem here:
library(arrow, warn.conflicts = FALSE)
arrow_array("-2147483648")$cast(int32()) |>
nanoarrow::as_nanoarrow_array() |>
as.vector()
#> [1] NA
Describe the bug, including details regarding any error messages, version, and platform.
R uses -2147483648 (int32_min) to represent missing integer values (
NA
). When converting Arrow arrays to R using the C API and then casting to R vectors usingas.vector
, arrays containing -2147483649 and below are converted to bit64::integer64, but if the minimum value of the array is exactly -2147483648, all the -2147483648s are converted to NA. This is an edge case but it's an important one, because int32_min is often used as a special sentinel value.Arrow should update the out-of-range check that decides whether to convert to bit64::integer64, to use -2147483647 as the minimum valid int32 rather than -2147483648.
Component(s)
R