Open PedroMilanezAlmeida opened 3 years ago
Hi Pedro, thanks for your interest in ALRA.
The ALRA function produces multiple copies of the matrix, which can be problematic when you have limited memory. The reason the matrices are duplicated in the memory is because we originally thought people will want to access the imputed matrix before scaling and thresholding. This does not seem to be the case...we are pretty much only interested in the final matrix.
Please see this branch, I added a function called alra.low.memory(). That should reduce the memory footprint. Can you try that function? See here.
If you still are having trouble, can you tell me at which step you actually get the error? Also, how much memory is on your laptop? Are any of these steps helpful?
Hi George, thanks for the quick feedback!
If I get it right, the change in alra.low.memory
is in line 271 (don't return all matrices), right? However, when I tried to run alra
step-by-step last night, memory was exhausted already at line 232 (A_norm_rank_k
is already another (approximate) copy of A_norm
, occupying additional 8.6 GB in memory).
While going through alra
step-by-step, I tried to convert A_norm_rank_k
to a dgCMatrix but, probably bc A_norm_rank_k
is not sparse, the conversion also exhausted the memory. I also tried to force the matrix multiplications in line 227 to give a dgCMatrix as result by coverting fastDecomp_noc$u
, fastDecomp_noc$v
and diag(fastDecomp_noc$d)
to dgCMatrix, but the matrix multiplications as dgCMatrix blew up memory anyways and never finished.
My solution for the moment was to run alra
only on 2k variable genes instead of the entire matrix, which runs pretty smooth and fast now, but I haven't yet looked into whether the results are any good.
Btw, my laptop has 16GB mem and I have not tried to change R_MAX_VSIZE in .Renviron yet.
Hi @linqiaozhi,
I've recently come across this issue also having hit the memory limits due to the large number of cells we are analysing.
I have noticed that there is a bug in the alra.low.memory
function. There is a line that checks if the class of the input A_norm is a matrix but the if statement only takes class(A_norm) == "matrix"
. In my case the output of class(A_norm)
is a vector of matrix
and array
and this generates an error that prevents the function from completing.
I have submitted a pull request where the if statement checks if matrix is present in the vector and ignores any additional classes.
Thanks
I am working with a matrix that has 53201 cells and 20245 genes.
Its size in memory is only 482 MB as a dgCMatrix but 8.62 GB as.matrix().
When I try RunALRA from Seurat, I get:
Same if I try to run with
alra(A_norm = as.matrix(normRNA), use.mkl = FALSE)
anduse.mkl = TRUE
(only that if TRUE it takes a lot longer to show the error).Do you have any suggestions for how to run on large matrices on a laptop?