aertslab / AUCell

AUCell: score single cells with gene regulatory networks
122 stars 26 forks source link

Building cell rankings with very large expression matrix #11

Closed alecorr closed 3 years ago

alecorr commented 4 years ago

Hi,

I'm trying to use AUCell to evaluate the expression of various gene signatures across my dataset but I'm running into some issues as my expression matrix is really large (a dgCMatrix with ~140,000 cells and 22,000 genes). Every time I try to run 'AUCell_buildRankings' on the expression matrix, I get the error:

cells_rankings <- AUCell_buildRankings(exprMat, verbose = T, plotStats = F)
Error in asMethod(object) : 
  Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105
Calls: AUCell_buildRankings ... .local -> as.matrix -> as.matrix.Matrix -> as -> asMethod
Execution halted

I think I'm getting this error because one of the first steps of the buildRankings function is to convert from sparse matrix to an uncompressed base R matrix:

exprMat <- as.matrix(exprMat)

Usually, this kind of issue can just be solved by increasing the amount of memory/RAM but in this case I think my data is just fundamentally too large to be converted from a sparse matrix regardless of the memory available (I've read that the limit in R is 2147483647 elements and my exprMat would be 3080000000).

Do you think there is any way to get around this issue (maybe just subsetting the most variable genes, or running the analysis in smaller chunks) or am I just being too ambitious running AUCell on a dataset like this?

Thanks, Alex

alecorr commented 3 years ago

I've just realised that I forgot to update this issue with the solution - which as it turns out, was pretty simple (and mentioned in this previous issue - apologies for not checking the closed issues before submitting my own).

AUCell has the function cbind which can be used to concatenate 2 cell rankings objects. This means a large expression matrix can be split in half before building rankings, and then merged to create an overall cell rankings object.

See below for a very simplified example of how to do this:

#define where to split expression matrix in half
half <- round(ncol(ExprMat)/2)

#split expression matrix
ExprMat_1 <- ExprMat[,1:half]
ExprMat_2 <- ExprMat[,(half+1):ncol(ExprMat)]

#build  cell rankings
cells_rankings_1 <- AUCell_buildRankings(ExprMat_1)
cells_rankings_2 <- AUCell_buildRankings(ExprMat_2)

#Combine rankings
cells_rankings <- AUCell::cbind(cells_rankings_1, cells_rankings_2)