Bioconductor / DelayedArray

A unified framework for working transparently with on-disk and in-memory array-like datasets
https://bioconductor.org/packages/DelayedArray
24 stars 9 forks source link

Importing DelayedArray messes up as.vector() for Arrow arrays #114

Closed Wainberg closed 6 months ago

Wainberg commented 6 months ago
> a = arrow::Array$create(c('foo'))
> as.vector(a)  # works fine
[1] "foo"
> suppressMessages(library(DelayedArray))
> as.vector(a)
Error in extract_array(x, index) :
  the first argument to extract_array() must be an array-like object
  (i.e. it must have dimensions)

The root cause is that as.vector() delegates to as.array(), and DelayedArray defines as.array() for the class Array:

> suppressMessages(library(DelayedArray))
> showMethods(as.array)
Function: as.array (package base)
x="ANY"
x="Array"  # <------------------------------ this one
x="COO_SparseArray"
x="Matrix"
x="SparseArraySeed"
x="sparseVector"
x="SVT_SparseArray"

It seems like the designers of DelayedArray didn't contemplate that anyone else in the R ecosystem would define a class named "Array", so whenever anyone calls as.vector() on an object with a class named "Array" (like arrow::Array), it will use DelayedArray's implementation.

Here's another example that explicitly shows that having a class named "Array" is the root cause of the problem:

> a = 1
> as.vector(a)  # works
[1] 1
> class(a) = 'Array'
> as.vector(a)  # works
[1] 1
>
> suppressMessages(library(DelayedArray))
> a = 1
> as.vector(a)  # works
[1] 1
> class(a) = 'Array'
> as.vector(a)  # doesn't work
Error in extract_array(x, index) :
  the first argument to extract_array() must be an array-like object
  (i.e. it must have dimensions)
hpages commented 6 months ago

Addressed in S4Arrays 1.2.1 (BioC 3.18, current release) and 1.3.6 (BioC 3.19, devel). See https://github.com/Bioconductor/S4Arrays/commit/59b8f4e28d2273145411f0d5429d1f31f6b79e12 and https://github.com/Bioconductor/S4Arrays/commit/e6498713c2654588d014ea2c052eef7b97a07802.

as.vector(a, mode="integer") or as.vector(a, mode="complex") or as.vector(a, mode="raw") are still misbehaving on Arrow arrays but that's no longer S4Arrays or DelayedArray's fault :wink: