broadinstitute / infercnv

Inferring CNV from Single-Cell RNA-Seq
Other
557 stars 164 forks source link

Very long runtime or job hanging during plotting #571

Open FerrenaAlexander opened 1 year ago

FerrenaAlexander commented 1 year ago

Hello, thanks again for all the work on this package!

I seem to have run into an issue with runtime / potentially run hanging.

I am invoking the pipeline like so:

infercnv_obj = infercnv::run(infercnv_obj,
                             cutoff=0.1,  # use 1 for smart-seq, 0.1 for 10x-genomics
                             out_dir=CNVoutdir,  # dir is auto-created for storing outputs
                             cluster_by_groups=T,   # cluster
                             denoise=T,
                             HMM=T, num_threads = num_threads,
                             save_rds = T, save_final_rds = T,
                             useRaster = F #hopefully this will take less mem / time
)

I have been using useRaster = F after facing large memory usage and reading through the suggestions at #541

This seems to reach STEP 19 as that is the last step I get when grepping the log file:

cat inf.Rout | grep STEP

    STEP 1: incoming data
    STEP 02: Removing lowly expressed genes
    STEP 03: normalization by sequencing depth
    STEP 04: log transformation of data
    STEP 08: removing average of reference data (before smoothing)
    STEP 09: apply max centered expression threshold: 3
    STEP 10: Smoothing data per cell by chromosome
    STEP 11: re-centering data across chromosome after smoothing
    STEP 12: removing average of reference data (after smoothing)
    STEP 14: invert log2(FC) to FC
    STEP 15: computing tumor subclusters via leiden
    STEP 17: HMM-based CNV prediction
    STEP 18: Run Bayesian Network Model on HMM predicted CNVs
    STEP 19: Filter HMM predicted CNVs based on the Bayesian Network Model results and BayesMaxPNormal

The job is still running but the log file has not been updated since 6/25 (date is currently 7/6 so almost two weeks). The last few lines look like this:


(base) [aferrena@aferrena splitsamps_onlymalignant]$ tail -n 50 inf.Rout

chr6-region_135047 : 2  (P= 0.248520414613958 ) ->  3 (P= 0.251679510522922 )
chr7-region_135049 : 4  (P= 0.248242301898835 ) ->  3 (P= 0.255931691110954 )
chr9-region_135055 : 2  (P= 0.248520414613958 ) ->  3 (P= 0.251679510522922 )
chr10-region_135058 : 4  (P= 0.248791576956809 ) ->  3 (P= 0.252968922206931 )
chr10-region_135060 : 4  (P= 0.248791576956809 ) ->  3 (P= 0.252968922206931 )
chr10-region_135062 : 4  (P= 0.131308038939401 ) ->  5 (P= 0.246797300418792 )
chr11-region_135063 : 4  (P= 0.126012900330496 ) ->  3 (P= 0.249813415465785 )
chr12-region_135068 : 2  (P= 0.248496539765357 ) ->  3 (P= 0.249614265076006 )
chr14-region_135071 : 2  (P= 0.248496539765357 ) ->  3 (P= 0.249614265076006 )
chr17-region_135076 : 4  (P= 0.248791576956809 ) ->  3 (P= 0.252968922206931 )
chr17-region_135078 : 4  (P= 0.248791576956809 ) ->  3 (P= 0.252968922206931 )
chr18-region_135080 : 4  (P= 0.25131742027528 ) ->  3 (P= 0.252808339506318 )
chr19-region_135083 : 4  (P= 0.125541479962163 ) ->  3 (P= 0.249513152353826 )
chr13-region_138496 : 2  (P= 0.145182293889819 ) ->  3 (P= 0.281795836561728 )
chr12-region_138946 : 2  (P= 0.19730522924732 ) ->  3 (P= 0.230791060349748 )
INFO [2023-06-24 01:11:58] Creating Plots for CNV and cell Probabilities.
INFO [2023-06-25 04:59:46] ::plot_cnv:Start
INFO [2023-06-25 04:59:46] ::plot_cnv:Current data dimensions (r,c)=8638,33943 Total=72248944.3359732 Min=0 Max=0.992917903120345.
INFO [2023-06-25 04:59:49] ::plot_cnv:Depending on the size of the matrix this may take a moment.
INFO [2023-06-25 04:59:55] plot_cnv_observation:Start
INFO [2023-06-25 04:59:55] Observation data size: Cells= 20327 Genes= 8638
INFO [2023-06-25 05:00:04] plot_cnv_observation:Writing observation groupings/color.
INFO [2023-06-25 05:00:05] plot_cnv_observation:Done writing observation groupings/color.
INFO [2023-06-25 05:00:05] plot_cnv_observation:Writing observation heatmap thresholds.
INFO [2023-06-25 05:00:05] plot_cnv_observation:Done writing observation heatmap thresholds.
INFO [2023-06-25 05:00:38] Colors for breaks:  #00008B,#24249B,#4848AB,#6D6DBC,#9191CC,#B6B6DD,#DADAEE,#FFFFFF,#EEDADA,#DDB6B6,#CC9191,#BC6D6D,#AB4848,#9B2424,#8B0000
INFO [2023-06-25 05:00:38] Quantiles of plotted data range: 0,0,0,0.873048078302225,0.992917903120345
INFO [2023-06-25 05:01:04] plot_cnv_references:Start
INFO [2023-06-25 05:01:04] Reference data size: Cells= 13616 Genes= 8638
INFO [2023-06-25 05:09:29] plot_cnv_references:Number reference groups= 1
INFO [2023-06-25 05:09:30] plot_cnv_references:Plotting heatmap.
INFO [2023-06-25 05:09:50] Colors for breaks:  #00008B,#24249B,#4848AB,#6D6DBC,#9191CC,#B6B6DD,#DADAEE,#FFFFFF,#EEDADA,#DDB6B6,#CC9191,#BC6D6D,#AB4848,#9B2424,#8B0000
INFO [2023-06-25 05:09:50] Quantiles of plotted data range: 0,0,0,0,0.983792401741968
INFO [2023-06-25 05:11:08] ::plot_cnv:Start
INFO [2023-06-25 05:11:08] ::plot_cnv:Current data dimensions (r,c)=8638,33943 Total=894280433 Min=1 Max=6.
INFO [2023-06-25 05:11:11] ::plot_cnv:Depending on the size of the matrix this may take a moment.
INFO [2023-06-25 05:11:17] plot_cnv_observation:Start
INFO [2023-06-25 05:11:17] Observation data size: Cells= 20327 Genes= 8638
INFO [2023-06-25 05:11:27] plot_cnv_observation:Writing observation groupings/color.
INFO [2023-06-25 05:11:27] plot_cnv_observation:Done writing observation groupings/color.
INFO [2023-06-25 05:11:27] plot_cnv_observation:Writing observation heatmap thresholds.
INFO [2023-06-25 05:11:27] plot_cnv_observation:Done writing observation heatmap thresholds.
INFO [2023-06-25 05:12:51] Colors for breaks:  #00008B,#24249B,#4848AB,#6D6DBC,#9191CC,#B6B6DD,#DADAEE,#FFFFFF,#EEDADA,#DDB6B6,#CC9191,#BC6D6D,#AB4848,#9B2424,#8B0000
INFO [2023-06-25 05:12:51] Quantiles of plotted data range: 1,3,3,3,6
INFO [2023-06-25 05:13:18] plot_cnv_references:Start
INFO [2023-06-25 05:13:18] Reference data size: Cells= 13616 Genes= 8638
INFO [2023-06-25 05:21:41] plot_cnv_references:Number reference groups= 1
INFO [2023-06-25 05:21:42] plot_cnv_references:Plotting heatmap.
INFO [2023-06-25 05:22:36] Colors for breaks:  #00008B,#24249B,#4848AB,#6D6DBC,#9191CC,#B6B6DD,#DADAEE,#FFFFFF,#EEDADA,#DDB6B6,#CC9191,#BC6D6D,#AB4848,#9B2424,#8B0000
INFO [2023-06-25 05:22:36] Quantiles of plotted data range: 1,3,3,3,6

I am concurrently running another job with mode = subclusters but it seems to get stuck at the same step.

Is there anything I can try to rectify this? For now I am waiting and keeping the job running.

Thank you! Alex

RaghadShu commented 12 months ago

Hi, I am facing the same issue I think. I am not getting outputs after step 19 for mode subclusters. Is there any explanation for this?