Open taylorpetty opened 4 months ago
Yeah subassignment is still a work-in-progress. In particular using a multidimensional index is not ready in general. It's on the TODO list but it might take a few more weeks before I get to it.
In the meantime, the workaround is to use a linear index i.e. a numeric vector or N-col matrix where N is the number of dimensions. In this case the right value must be a vector. So:
library(SparseArray)
a <- array(1:45, dim = c(3,3,5))
s <- SparseArray(a)
a[2,,4] <- 7:9
s[cbind(2,1:3,4)] <- 7:9
identical(as.array(s), a) # TRUE
a[,,3] <- matrix(1:9,nrow=3)
s[19:27] <- as.vector(matrix(1:9,nrow=3))
identical(as.array(s), a) # TRUE
a <- array(1:10, dim = c(5,2))
s <- SparseArray(a)
a[1,1:2] <- 8:9
s[cbind(1,1:2)] <- 8:9
identical(as.array(s), a) # TRUE
Not always super easy/convenient to translate a multidimensional index [x,y,z]
into a linear index though, sorry. However this form of subassignment is well tested and quite efficient.
Hopefully that'll get you going for now.
This is helpful. Thank you for the quick response. If anyone is reading this and wants a quick and dirty workaround solution for now, here's the helper function I came up with:
mygrid <- function(...) {
as.matrix(data.table::CJ(...))
}
Then you can write things like this:
a = array(0L, dim = 5:3)
s = SparseArray(a)
a[1:2,1:3,2] = 2L
s[mygrid(1:2,1:3,2)] = 2L
identical(a, as.array(s)) # TRUE
x = poissonSparseArray(5:3, lambda=0)
y = x
x[rbind(cbind(1,1:3,1), cbind(2,1:3,1))] <- 10L
y[mygrid(1:2,1:3,1)] <- 10L
identical(x,y) # TRUE
Nice. Good to know about data.table::CJ()
.
I don't know the details of your use case but note that another way to build a big multidimensional sparse array is to construct slices of lower dimensions and abind()
them together. For example, to build a big 3D sparse array, construct the individual 2D slices with something like this:
library(SparseArray)
my_2D_slices <- lapply(1:20, function(i) poissonSparseMatrix(8000, 2500, density=0.15))
or load them from a file, then stack them together with:
my_3D_svt <- do.call(abind, c(my_2D_slices, list(along=3)))
my_3D_svt
# <8000 x 2500 x 20 SparseArray> of type "integer" [nzcount=60008425 (15%)]:
# ,,1
# [,1] [,2] [,3] [,4] ... [,2497] [,2498] [,2499] [,2500]
# [1,] 0 0 0 0 . 0 0 0 0
# [2,] 0 0 0 0 . 0 0 0 0
# ... . . . . . . . . .
# [7999,] 0 0 0 0 . 0 0 0 1
# [8000,] 0 0 0 0 . 0 0 0 1
#
# ...
#
# ,,20
# [,1] [,2] [,3] [,4] ... [,2497] [,2498] [,2499] [,2500]
# [1,] 1 0 0 0 . 0 0 0 1
# [2,] 0 0 0 0 . 0 0 0 0
# ... . . . . . . . . .
# [7999,] 1 0 0 0 . 0 0 0 0
# [8000,] 0 2 0 0 . 0 0 0 0
Note that stacking SVT_SparseArray objects of identical dimensions like this is a very cheap operation (almost instantaneous!). In particular it will be a lot more efficient than creating a big allzero array and subassigning slices to it.
See ?S4Arrays::abind
for more information.
Also be aware that the row/col summarization methods for SVT_SparseArray objects (a.k.a. matrixStats methods) can handle multidimensional objects and support the dims
argument:
cv23 <- colVars(my_3D_svt) # takes < 0.2s
dim(cv23)
# [1] 2500 20
rv12 <- rowVars(my_3D_svt, dims=2) # takes < 4s
dim(rv12)
# [1] 8000 2500
See ?SparseArray::colSums
for more information.
Hope this helps, H.
I'm unable to assign 1- or 2D-slices to vectors. I can loop through every element of an array with nested for-loops, but it is incredibly slow to do. I get a variety of error messages, demonstrated below. All errors were identical whether using
=
or<-
, on a Windows laptop and a Linux server.The errors differ whether subsetting by column or by row, and whether it's the entire column or a subset of it:
Windows sessionInfo()
``` R version 4.3.1 (2023-06-16 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 11 x64 (build 22631) Matrix products: default locale: [1] LC_COLLATE=English_United States.utf8 [2] LC_CTYPE=English_United States.utf8 [3] LC_MONETARY=English_United States.utf8 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.utf8 time zone: America/New_York tzcode source: internal attached base packages: [1] stats4 stats graphics grDevices utils datasets methods [8] base other attached packages: [1] SparseArray_1.2.4 S4Arrays_1.2.1 IRanges_2.36.0 [4] abind_1.4-5 S4Vectors_0.40.2 MatrixGenerics_1.14.0 [7] matrixStats_1.3.0 BiocGenerics_0.48.1 Matrix_1.6-5 loaded via a namespace (and not attached): [1] zlibbioc_1.48.0 lattice_0.22-5 parallel_4.3.1 XVector_0.42.0 [5] this.path_2.3.1 grid_4.3.1 compiler_4.3.1 tools_4.3.1 [9] crayon_1.5.2 ```Linux sessionInfo()
``` R version 4.3.1 (2023-06-16) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: Rocky Linux 9.2 (Blue Onyx) Matrix products: default BLAS/LAPACK: FlexiBLAS OPENBLAS-OPENMP; LAPACK version 3.9.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C time zone: America/New_York tzcode source: system (glibc) attached base packages: [1] stats4 stats graphics grDevices utils datasets methods [8] base other attached packages: [1] SparseArray_1.2.4 S4Arrays_1.2.1 IRanges_2.36.0 [4] abind_1.4-5 S4Vectors_0.40.2 MatrixGenerics_1.14.0 [7] matrixStats_1.2.0 BiocGenerics_0.48.1 Matrix_1.6-4 loaded via a namespace (and not attached): [1] zlibbioc_1.48.0 compiler_4.3.1 tools_4.3.1 XVector_0.42.0 [5] crayon_1.5.2 grid_4.3.1 lattice_0.22-5 ```