broadinstitute / infercnv

Inferring CNV from Single-Cell RNA-Seq
Other
557 stars 164 forks source link

Cannot allocate vector of sze 128GB ; in asMethod(object): sparse->dense coercion: allocating vector of size #541

Closed FerrenaAlexander closed 1 year ago

FerrenaAlexander commented 1 year ago

Hi - thank you for all the work on this package!

I seme to be running into some issue related to memory during plotting:

INFO [2023-05-14 11:21:44] Observation data size: Cells= 21201 Genes= 8673
INFO [2023-05-14 11:22:18] plot_cnv_observation:Writing observation groupings/color.
INFO [2023-05-14 11:22:18] plot_cnv_observation:Done writing observation groupings/color.
INFO [2023-05-14 11:22:18] plot_cnv_observation:Writing observation heatmap thresholds.
INFO [2023-05-14 11:22:18] plot_cnv_observation:Done writing observation heatmap thresholds.
INFO [2023-05-14 11:22:44] Colors for breaks:  #00008B,#24249B,#4848AB,#6D6DBC,#9191CC,#B6B6DD,#DADAEE,#FFFFFF,#EEDADA,#DDB6B6,#CC9191,#BC6D6D,#AB4848,#9B2424,#8B0000
INFO [2023-05-14 11:22:44] Quantiles of plotted data range: 1,3,3,3,6
INFO [2023-05-14 11:23:12] plot_cnv_references:Start
INFO [2023-05-14 11:23:12] Reference data size: Cells= 13616 Genes= 8673
INFO [2023-05-14 11:35:29] plot_cnv_references:Number reference groups= 1
INFO [2023-05-14 11:35:30] plot_cnv_references:Plotting heatmap.
INFO [2023-05-14 11:35:46] Colors for breaks:  #00008B,#24249B,#4848AB,#6D6DBC,#9191CC,#B6B6DD,#DADAEE,#FFFFFF,#EEDADA,#DDB6B6,#CC9191,#BC6D6D,#AB4848,#9B2424,#8B0000
INFO [2023-05-14 11:35:46] Quantiles of plotted data range: 1,3,3,3,6
Error: cannot allocate vector of size 128.0 Gb
In addition: Warning messages:
1: In asMethod(object) :
  sparse->dense coercion: allocating vector of size 2.2 GiB
2: In asMethod(object) :
  sparse->dense coercion: allocating vector of size 2.2 GiB
3: In asMethod(object) :
  sparse->dense coercion: allocating vector of size 1.3 GiB
4: In asMethod(object) :
  sparse->dense coercion: allocating vector of size 1.3 GiB
5: In asMethod(object) :
  sparse->dense coercion: allocating vector of size 1.3 GiB
6: In asMethod(object) :
  sparse->dense coercion: allocating vector of size 1.3 GiB
7: In asMethod(object) :
  sparse->dense coercion: allocating vector of size 2.2 GiB
Execution halted

I believe it is during Step 19 as that is the last STEP that gets grepped out from the log.

The data is 34817 cells by 23160 genes at input, 10X. It is tumor data from 10 samples. I am running to find CNVs in each sample and in fibroblast. 13616 cells are the reference. The "target" cells are split across CAFs and malignant source samples like so:

            Var1 Freq
1     Fibroblast  874
2  Malignant_A_1 1206
3  Malignant_A_2   42
4  Malignant_A_3 1309
5  Malignant_B_1  963
6  Malignant_B_2 2617
7  Malignant_B_3 5794
8  Malignant_C_1  156
9  Malignant_C_2 3959
10 Malignant_C_3  545
11 Malignant_C_4 3736

I ran this on a SLURM system on a linux HPC with 350 GB memory and 7 threads (sbatch cpus-per-task=7) like below:

infercnv_obj = infercnv::run(infercnv_obj,
                             cutoff=0.1,  # use 1 for smart-seq, 0.1 for 10x-genomics
                             out_dir=CNVoutdir,  # dir is auto-created for storing outputs
                             cluster_by_groups=T,   # cluster
                             denoise=T,
                             HMM=T, num_threads = num_threads,
                             save_rds = F, save_final_rds = T)

Is there anything I can do to rectify this?

singlece commented 1 year ago

I met the samle problem. Don't know how to fix it.

FerrenaAlexander commented 1 year ago

Hi, just wanted to mention that I was able to get past this error by running on an HPC system with 1000GB of RAM. It seems excessive though, I wonder if this can be optimized. But leaving open for now since others have had this issue too. Thanks for all the work on this package!

GeorgescuC commented 1 year ago

Hi @FerrenaAlexander @singlece ,

Based on the size of the dataset shown in that log, 128GB of RAM should be way more than needed, especially if only during the plotting. There is a different error that occasionally occurs during plotting but with a different message (saying an outrageous amount of memory in TB cannot be allocated) which appears to be related to the useRaster option. One thing you could give a try is to set useRaster=FALSE in run() or plot_cnv(). The option controls R rasterization optimization which make the plotting significantly faster, but occasionally trigger weird behavior.

Regards, Christophe.

singlece commented 1 year ago

The code is corrected as following but still gets the same problem mentioned above. Code: infercnv_obj2 = infercnv::run(infercnv_obj, cutoff=0.1,
out_dir= "./output",
cluster_by_groups = TRUE,
hclust_method="ward.D2", denoise=TRUE, HMM=FALSE, useRaster=FALSE, num_threads = 10)

GeorgescuC commented 1 year ago

Hi @singlece ,

I have never seen this specific type of error before, so I am not sure what causes it. To debug a different issue I just plotted a dataset with 160k+ cells on a machine with 180Gb of RAM, so there's no reason why it should ask for 128Gb of RAM for 20k cells.

Could you let me know what the return of sessionInfo() in your R session is?

Could you also privately share the backup object from the last completed step so I can try to reproduce the issue?

Regards, Christophe.

FerrenaAlexander commented 1 year ago

Hi all, I'm closing this as I do not seem to get this issue when either running with the following parameters:

I continue to have issues with runtime with some runs but that's different so I'll open a second issue.

Thanks very much for the suggestions!