Closed avalcarcel9 closed 4 years ago
This is consistent with the subsetting of ordinary matrices, see Bioconductor/DelayedArray#6. If you want to preserve the original class of the matrix, use drop=FALSE
.
Thanks @LTLA ! This makes sense now. I agree with all the discussion in the original post at first its confusing that the data load into memory without explicitly forcing this but I also agree that avid users will more often need the data loaded after the subset.
I was following some documentation/instructions from a workshop found here in 15.1 Overview. You'll see in that section da_Rle[1:10,]
is a subset called and the data did remain a DelayedMatrix
rather than get loaded into memory. Is this is a property specific to RleMatrix
and DelayedMatrix
? Or maybe the user changed some profile settings to automatically use drop = FALSE? I've found this some of more thorough documentation on using the package and it made me think my realization was a bug.
@avalcarcel9
The result of subsetting is returned as an ordinary vector only when it is a mono-dimensional slice (e.g. x[ , 3]
). In the example you are referring to da_Rle[1:10, ]
has dimensions 10 x 2 so nothing gets dropped i.e. the result is still a DelayedMatrix object. But if you select a single row (e.g. with da_Rle[10, ]
) or single column (e.g. da_Rle[ , 2]
) then the result will be loaded in memory and returned as an ordinary vector.
Note that you can find a more recent version of the "Effectively using the DelayedArray framework to support the analysis of large datasets" workshop here: http://biocworkshops2019.bioconductor.org.s3-website-us-east-1.amazonaws.com/page/DelayedArrayWorkshop__Effectively_using_the_DelayedArray_framework_for_users/
H.
Thanks @hpages for the clarification! I think this makes sense now! Also thanks for the updated documentation! I'm closing this!
When subsetting an
HDF5Matrix
the subset is loaded into memory and returned.As a simple example you can use the code below. The results returned to me are commented out. When subsetting using
[
it seems that the object is being loaded into memory rather than remaining anHDF5Matrix
.I am using the development version of the package for BioConductor 3.10. See session info below.