Bioconductor / DelayedArray

A unified framework for working transparently with on-disk and in-memory array-like datasets
https://bioconductor.org/packages/DelayedArray
24 stars 9 forks source link

Error when calling as.matrix() on DelayedMatrix with length > .Machine$integer.max #22

Closed PeteHaitch closed 6 years ago

PeteHaitch commented 6 years ago

I made a large DelayedMatrix, M0, by cbind()-ing two HDF5Matrix objects (there was also some row-subsetting). When calling as.matrix(M0)), I got a rather obscure error. It looks like integer overflow, which may be related to the fact the the length(M0) > .Machine$integer.max:

> dim(M0)
[1] 27660298      177
> length(M0)
[1] 4895872746
> length(M0) > .Machine$integer.max
[1] TRUE
> showtree(M0)
27660298x177 double: DelayedMatrix object
└─ 27660298x177 double: Unary iso op
   └─ 27660298x177 double: Subset
      └─ 29307078x177 double: Abind (along=2)
         ├─ 29307078x32 double: Set dimnames
         │  └─ 29307078x32 double: [seed] HDF5ArraySeed object
         └─ 29307078x145 double: Set dimnames
            └─ 29307078x145 double: [seed] HDF5ArraySeed object
> m0 <- as.matrix(M0)
Error in successiveIRanges(rep.int(width, k), from = offset + 1L) :
  'width' cannot contain NAs or negative values
In addition: Warning message:
In block_lens * dims[n, ] : NAs produced by integer overflow

I can post more details tomorrow, but I wanted to jot this down before heading home for the day. Obviously calling as.matrix() on such a large DelayedMatrix isn't generally a good idea, but I've got a big memory machine and it helps a bit when doing quick exploratory analyses.

hpages commented 6 years ago

Thanks Pete. I've narrowed this down to DelayedArray:::simple_abind() breaking when the objects to bind are longer than .Machine$integer.max:

library(DelayedArray)
m <- matrix(raw(), nrow=5e7, ncol=25)
tmp <- DelayedArray:::simple_abind(m, m, along=2L)
# Error in validObject(.Object) : 
#   invalid class “IRanges” object: 'start(x)', 'end(x)', and 'width(x)' cannot contain NAs
# In addition: Warning messages:
# 1: In .intertwine_blocks(objects, block_lens) :
#   integer overflow in 'cumsum'; use 'cumsum(as.numeric(.))'
# 2: In width(x) - 1L + start(x) : NAs produced by integer overflow

Will work on a fix. H.

hpages commented 6 years ago

This should be fixed in DelayedArray 0.7.13 (commit 88700ad478678099b674808e32ce01a7a9568011).

hpages commented 6 years ago

@PeteHaitch Hi Pete, is this something we can close? Thx!

PeteHaitch commented 6 years ago

I tries this on (roughly) the same example dataset and got the following:

Warning message:
In block_lens * dims[n, ] : NAs produced by integer overflow

It does seem to have worked, however (e.g. there are no NA elements in the final matrix). This is using v0.7.18.

hpages commented 6 years ago

arghh... I see. Should be addressed in DelayedArray 0.7.24 (see https://github.com/Bioconductor/DelayedArray/commit/7f9e7dd4514864c4b7065f8d12c8c1123868dd0b). Can you give it another try? Hopefully this time I got it right.

Thanks Pete!

hpages commented 6 years ago

Closing this. Feel free to re-open if this still gives you trouble.