broadinstitute / infercnv

Inferring CNV from Single-Cell RNA-Seq
Other
566 stars 166 forks source link

Cholmod error 'problem too large' at file #388

Closed talban14 closed 2 years ago

talban14 commented 2 years ago

Hi I keep receiving the message below when I try to run all of my samples together in an analysis. Is there a work around for this issue? I'm working on a 3TB node so I think i should have enough memory available.

INFO [2022-01-18 16:05:04]

STEP 02: Removing lowly expressed genes

INFO [2022-01-18 16:05:04] ::above_min_mean_expr_cutoff:Start INFO [2022-01-18 16:05:58] Removing 13952 genes from matrix as below mean expr threshold: 0.1 INFO [2022-01-18 16:07:09] validating infercnv_obj INFO [2022-01-18 16:07:09] There are 6852 genes and 328875 cells remaining in the expr matrix. Error in asMethod(object) : Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 102

best, Tyler

GeorgescuC commented 2 years ago

Hi @talban14 ,

This might be a hard limitation on the matrix size once it is not in sparse format anymore. Could you please run options(error = function() traceback(2)) first then rerun infercnv to get a more precise log of where the error happens?

Their may be a way to delay the conversion from sparse to dense matrix, but it won't be possible to do so through the smoothing step, so we may need to find a workaround that involves splitting the observation data once enough of the initial (pre)processing is done.

Regards, Christophe.

wong-nw commented 2 years ago

Hi Christophe,

I ran into this error too, and I added the command you requested in your response. The relevant error output is here:

INFO [2022-03-09 14:00:32] validating infercnv_obj
INFO [2022-03-09 14:00:32] There are 14638 genes and 170789 cells remaining in the expr matrix.
Error in asMethod(object) : 
  Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105
Calls: <Anonymous> ... apply -> as.matrix -> as.matrix.Matrix -> as -> asMethod
9: (function () 
   traceback(2))()
8: asMethod(object)
7: as(x, "matrix")
6: as.matrix.Matrix(X)
5: as.matrix(X)
4: apply(infercnv_obj@expr.data, 1, function(x) {
       sum(x > 0 & !is.na(x)) >= min_cells_per_gene
   })
3: which(apply(infercnv_obj@expr.data, 1, function(x) {
       sum(x > 0 & !is.na(x)) >= min_cells_per_gene
   }))
2: require_above_min_cells_ref(infercnv_obj, min_cells_per_gene = min_cells_per_gene)
1: infercnv::run(infercnv_obj, cutoff = 0.01, out_dir = "inferCNV_19Samples_full_202203", 
       cluster_by_groups = T, denoise = T, HMM = T, num_threads = 60, 
       analysis_mode = "subclusters")

Out of curiosity, what is the largest dataset that you've been able to process?

Please let us know what you think,

-Nathan

GeorgescuC commented 2 years ago

Hi @wong-nw

Sorry in the delay getting back to you. In the past I have run infercnv with a dataset of 90k cells total without this error. From what I was able to find, the issue seems to happen when converting a sparse matrix to a dense matrix (because of the smoothing), which R does not handle for such big matrices.

There seems to be a potential workaround, but I have not had the occasion to test it on such a big dataset yet, and do not know if there won't be any other downstream issues. I will try to generate a simulated dataset of similar size to test with. In the meantime, something worth testing is to apply the code from that post to convert the matrix to a dense matrix from the start and provid that to infercnv.

Regards, Christophe.