Open biobenkj opened 4 years ago
I believe, without actually checking, that the names are stored independently of the underlying data representation, and the cost is associated with adding names and hence duplicating the underlying data. If it's 'easy' to simulate the data for a reproducible example that would be great.
Hi Ben, @biobenkj
I'm glad to hear you are making use of this data representation!
The trick behind RaggedExperiment
involves providing a matrix representation from a GRangesList
object. In the background, the stored representation is a GRangesList
so accessing the metadata it relatively straightforward. When using assay
, the GRangesList
representation has to be converted to matrix, this involves creating quite a large sparse matrix from the mcols
in the original GRangesList
, a costly operation.
I agree, a minimal and reproducible example would be helpful. We'll see what we can do to increase the efficiency of this conversion. Thank you.
@biobenkj Any updates on this?
Would a dgCMatrix
representation help? Have you tested this?
We can create additional functionality to return this data representation.
If you can provide a reproducible example to help this move along, that would be great. Thanks!
RaggedExperiment continues to rule for all our 'omics related work! I did notice something interesting yesterday when running compactSummarizedExperiment(), when I attempt to access the names of the assays in a large RE
it will either be near instantaneous with using assayNames(), or require 100s of GB of memory with names(assay(my_RE)). Do you know why this might be the case? I'll work on getting a smaller reproducible example if there is interest.
Thanks again for all that you do and RaggedExperiments!