Open ainefairbrother opened 8 months ago
Hey! Yep definitely a size issue, are you running the function with the as_DelayedArray = TRUE
? By default it is FALSE
, this might help. It's hard to debug exactly where you are running into the size issue without having access to the data.
Alan.
@ainefairbrother it would also be very helpful to know what kind of machine you're running this on (storage, memory, is it an computing cluster? etc.). Could you give us more details on this?
Something that can help is to reduce the number of cores being used. Due to how R often deals with parallelisation (copying the entire environment) more cores isn't always better as it can lead to an explosion of memory usage. This will take some troubleshooting on your end to figure out what your machine can handle.
@ainefairbrother it would also be very helpful to know what kind of machine you're running this on (storage, memory, is it an computing cluster? etc.). Could you give us more details on this?
Something that can help is to reduce the number of cores being used. Due to how R often deals with parallelisation (copying the entire environment) more cores isn't always better as it can lead to an explosion of memory usage. This will take some troubleshooting on your end to figure out what your machine can handle.
This is a server with ~133TB storage, 1TB RAM
Architecture and CPU info:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 43 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 256
On-line CPU(s) list: 0-255
Linux kernal version: 5.15.0-73-generic
@ainefairbrother it would also be very helpful to know what kind of machine you're running this on (storage, memory, is it an computing cluster? etc.). Could you give us more details on this?
Something that can help is to reduce the number of cores being used. Due to how R often deals with parallelisation (copying the entire environment) more cores isn't always better as it can lead to an explosion of memory usage. This will take some troubleshooting on your end to figure out what your machine can handle.
Noted, thanks. I have tried with setting as_DelayedArray
to TRUE
and reducing the number of cores. I get a different error:
5 core(s) assigned as workers (251 reserved).
Error in .from_Array_to_matrix(x, ...) : unused argument ("matrix")
You will need to create and share a subset of your data which replicates the issue if you want us to debug this, the function works with the as_DelayedArray = TRUE
with other datasets:
> # Load the single cell data
> cortex_mrna <- ewceData::cortex_mrna()
see ?ewceData and browseVignettes('ewceData') for documentation
loading from cache
> # Use only a subset to keep the example quick
> expData <- cortex_mrna$exp#[1:100, ]
> l1 <- cortex_mrna$annot$level1class
> l2 <- cortex_mrna$annot$level2class
> annotLevels <- list(l1 = l1, l2 = l2)
> fNames_ALLCELLS <- EWCE::generate_celltype_data(
+ exp = expData,
+ annotLevels = annotLevels,
+ groupName = "allKImouse",
+ as_DelayedArray = TRUE
+ )
1 core(s) assigned as workers (10 reserved).
Converting to sparse matrix.
Converting to DelayedArray.
+ Calculating normalized mean expression.
Converting to sparse matrix.
Converting to sparse matrix.
+ Calculating normalized specificity.
Converting to sparse matrix.
Converting to sparse matrix.
Converting to sparse matrix.
Converting to sparse matrix.
+ Saving results ==> /tmp/Rtmpc3QW1s/ctd_allKImouse.rda
Subsetting on the number of genes and cells should be relatively quick to do.
Noted, thanks. I have tried with setting
as_DelayedArray
toTRUE
and reducing the number of cores. I get a different error:5 core(s) assigned as workers (251 reserved). Error in .from_Array_to_matrix(x, ...) : unused argument ("matrix")
It looks like this is coming from somewhere within DelayedArray, but can't quite pinpoint where or why yet.
Also, @ainefairbrother , can you confirm that your exp
DelayedArray object is backed at time you're running the function? ie saved to disk and only reading in the each chunk into memory as-needed. That will ensure you're fully benefitting from the DelayedArray capabilities.
This can be done using this set of functions: https://rdrr.io/bioc/HDF5Array/man/saveHDF5SummarizedExperiment.html
See here for more info: https://www.bioconductor.org/packages/release/bioc/vignettes/DelayedArray/inst/doc/02-Implementing_a_backend.html
Noted, thanks. I have tried with setting
as_DelayedArray
toTRUE
and reducing the number of cores. I get a different error:5 core(s) assigned as workers (251 reserved). Error in .from_Array_to_matrix(x, ...) : unused argument ("matrix")
It looks like this is coming from somewhere within DelayedArray, but can't quite pinpoint where or why yet.
Also, @ainefairbrother , can you confirm that your
exp
DelayedArray object is backed at time you're running the function? ie saved to disk and only reading in the each chunk into memory as-needed. That will ensure you're fully benefitting from the DelayedArray capabilities.This can be done using this set of functions: https://rdrr.io/bioc/HDF5Array/man/saveHDF5SummarizedExperiment.html
See here for more info: https://www.bioconductor.org/packages/release/bioc/vignettes/DelayedArray/inst/doc/02-Implementing_a_backend.html
Hi Brian - yes, can confirm that it's backed at the time of running the function.
1. Bug description
When running
generate_celltype_data()
on a gene-by-cell matrix of 34807 x 786896, I get a large array error. My input object (exp
) issparse DelayedMatrix object of type "double"
.Console output
Session info