broadinstitute / infercnv

Inferring CNV from Single-Cell RNA-Seq
Other
558 stars 166 forks source link

Error at STEP 17: HMM-based CNV prediction #242

Closed ccruizm closed 2 years ago

ccruizm commented 4 years ago

Good day,

I am having an error when reaching step 17. I am using 10x genomics v3 data and use the raw count matrix as input. Below the code I used and the error:

Command:

infercnv_obj = infercnv::run(infercnv_obj,
                             cutoff=0.1, 
                             out_dir="output_inferCNV_with-ref", 
                             cluster_by_groups=F,   # cluster
                             denoise=F,
                             HMM=T,
                             num_threads = 10)

Error:

INFO [2020-06-21 09:08:47] ::process_data:Start
INFO [2020-06-21 09:08:47] Checking for saved results.
INFO [2020-06-21 09:08:47] Trying to reload from step 15
INFO [2020-06-21 09:09:07] Using backup from step 15
INFO [2020-06-21 09:09:07] 

    STEP 1: incoming data

INFO [2020-06-21 09:09:07] 

    STEP 02: Removing lowly expressed genes

INFO [2020-06-21 09:09:07] 

    STEP 03: normalization by sequencing depth

INFO [2020-06-21 09:09:07] 

    STEP 04: log transformation of data

INFO [2020-06-21 09:09:07] 

    STEP 08: removing average of reference data (before smoothing)

INFO [2020-06-21 09:09:07] 

    STEP 09: apply max centered expression threshold: 3

INFO [2020-06-21 09:09:07] 

    STEP 10: Smoothing data per cell by chromosome

INFO [2020-06-21 09:09:07] 

    STEP 11: re-centering data across chromosome after smoothing

INFO [2020-06-21 09:09:07] 

    STEP 12: removing average of reference data (after smoothing)

INFO [2020-06-21 09:09:07] 

    STEP 14: invert log2(FC) to FC

INFO [2020-06-21 09:09:07] 

    STEP 15: Clustering samples (not defining tumor subclusters)

INFO [2020-06-21 09:09:07] 

    STEP 17: HMM-based CNV prediction

INFO [2020-06-21 09:09:07] predict_CNV_via_HMM_on_whole_tumor_samples
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...): NA/NaN/Inf in 'y'
Traceback:

1. infercnv::run(infercnv_obj, cutoff = 0.1, out_dir = "output_inferCNV_with-ref2", 
 .     cluster_by_groups = F, denoise = F, HMM = T, num_threads = 10)
2. predict_CNV_via_HMM_on_whole_tumor_samples(infercnv_obj, t = HMM_transition_prob)
3. lapply(chrs, function(chr) {
 .     chr_gene_idx = which(gene_order$chr == chr)
 .     lapply(tumor_samples, function(tumor_sample_cells_idx) {
 .         gene_expr_vals = rowMeans(expr.data[chr_gene_idx, tumor_sample_cells_idx, 
 .             drop = FALSE])
 .         num_cells = length(tumor_sample_cells_idx)
 .         state_emission_params <- .get_state_emission_params(num_cells, 
 .             cnv_mean_sd, cnv_level_to_mean_sd_fit)
 .         hmm <- HiddenMarkov::dthmm(gene_expr_vals, HMM_info[["state_transitions"]], 
 .             HMM_info[["delta"]], "norm", state_emission_params)
 .         hmm_trace <- Viterbi.dthmm.adj(hmm)
 .         hmm.data[chr_gene_idx, tumor_sample_cells_idx] <<- hmm_trace
 .     })
 . })
4. FUN(X[[i]], ...)
5. lapply(tumor_samples, function(tumor_sample_cells_idx) {
 .     gene_expr_vals = rowMeans(expr.data[chr_gene_idx, tumor_sample_cells_idx, 
 .         drop = FALSE])
 .     num_cells = length(tumor_sample_cells_idx)
 .     state_emission_params <- .get_state_emission_params(num_cells, 
 .         cnv_mean_sd, cnv_level_to_mean_sd_fit)
 .     hmm <- HiddenMarkov::dthmm(gene_expr_vals, HMM_info[["state_transitions"]], 
 .         HMM_info[["delta"]], "norm", state_emission_params)
 .     hmm_trace <- Viterbi.dthmm.adj(hmm)
 .     hmm.data[chr_gene_idx, tumor_sample_cells_idx] <<- hmm_trace
 . })
6. FUN(X[[i]], ...)
7. .get_state_emission_params(num_cells, cnv_mean_sd, cnv_level_to_mean_sd_fit)
8. get_hspike_cnv_mean_sd_trend_by_num_cells_fit(infercnv_obj@.hspike)
9. lapply(tmp_names, function(cnv_level) {
 .     sd_vals = cnv_level_to_mean_sd[[cnv_level]]
 .     num_cells = seq_along(sd_vals)
 .     fit = lm(log(sd_vals) ~ log(num_cells))
 .     fit
 . })
10. FUN(X[[i]], ...)
11. lm(log(sd_vals) ~ log(num_cells))
12. lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...)

Session info:

R version 4.0.0 (2020-04-24)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Gentoo/Linux

Matrix products: default
BLAS/LAPACK: /home/cruiz/anaconda3/envs/r_env_4.0/lib/libopenblasp-r0.3.9.so

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] cowplot_1.0.0  future_1.17.0  dplyr_1.0.0    Seurat_3.1.5   infercnv_1.4.0

loaded via a namespace (and not attached):
  [1] TH.data_1.0-10              Rtsne_0.15                 
  [3] colorspace_1.4-1            ellipsis_0.3.1             
  [5] modeltools_0.2-23           ggridges_0.5.2             
  [7] IRdisplay_0.7.0             futile.logger_1.4.3        
  [9] XVector_0.28.0              GenomicRanges_1.40.0       
 [11] base64enc_0.1-3             leiden_0.3.3               
 [13] listenv_0.8.0               ggrepel_0.8.2              
 [15] mvtnorm_1.1-0               coin_1.3-1                 
 [17] codetools_0.2-16            splines_4.0.0              
 [19] doParallel_1.0.15           libcoin_1.0-5              
 [21] IRkernel_1.1                jsonlite_1.6.1             
 [23] ica_1.0-2                   argparse_2.0.1             
 [25] cluster_2.1.0               png_0.1-7                  
 [27] rjags_4-10                  uwot_0.1.8                 
 [29] sctransform_0.2.1           compiler_4.0.0             
 [31] httr_1.4.1                  lazyeval_0.2.2             
 [33] Matrix_1.2-18               limma_3.44.1               
 [35] formatR_1.7                 htmltools_0.4.0            
 [37] tools_4.0.0                 rsvd_1.0.3                 
 [39] igraph_1.2.5                coda_0.19-3                
 [41] gtable_0.3.0                glue_1.4.1                 
 [43] GenomeInfoDbData_1.2.3      reshape2_1.4.4             
 [45] RANN_2.6.1                  rappdirs_0.3.1             
 [47] Rcpp_1.0.4.6                Biobase_2.48.0             
 [49] vctrs_0.3.1                 gdata_2.18.0               
 [51] ape_5.4                     nlme_3.1-147               
 [53] iterators_1.0.12            lmtest_0.9-37              
 [55] stringr_1.4.0               fastcluster_1.1.25         
 [57] globals_0.12.5              lifecycle_0.2.0            
 [59] irlba_2.3.3                 gtools_3.8.2               
 [61] edgeR_3.30.3                zlibbioc_1.34.0            
 [63] MASS_7.3-51.6               zoo_1.8-8                  
 [65] scales_1.1.1                parallel_4.0.0             
 [67] SummarizedExperiment_1.18.1 sandwich_2.5-1             
 [69] lambda.r_1.2.4              RColorBrewer_1.1-2         
 [71] SingleCellExperiment_1.10.1 reticulate_1.16            
 [73] pbapply_1.4-2               gridExtra_2.3              
 [75] ggplot2_3.3.1               stringi_1.4.6              
 [77] reshape_0.8.8               S4Vectors_0.26.1           
 [79] foreach_1.5.0               caTools_1.18.0             
 [81] BiocGenerics_0.34.0         repr_1.1.0                 
 [83] GenomeInfoDb_1.24.0         rlang_0.4.6                
 [85] pkgconfig_2.0.3             matrixStats_0.56.0         
 [87] bitops_1.0-6                evaluate_0.14              
 [89] lattice_0.20-41             ROCR_1.0-11                
 [91] purrr_0.3.4                 htmlwidgets_1.5.1          
 [93] patchwork_1.0.0             tidyselect_1.1.0           
 [95] RcppAnnoy_0.0.16            plyr_1.8.6                 
 [97] magrittr_1.5                R6_2.4.1                   
 [99] IRanges_2.22.2              gplots_3.0.3               
[101] generics_0.0.2              multcomp_1.4-13            
[103] pbdZMQ_0.3-3                DelayedArray_0.14.0        
[105] pillar_1.4.4                findpython_1.0.5           
[107] fitdistrplus_1.1-1          survival_3.1-12            
[109] RCurl_1.98-1.2              tsne_0.1-3                 
[111] tibble_3.0.1                future.apply_1.5.0         
[113] crayon_1.3.4                futile.options_1.0.1       
[115] uuid_0.1-4                  KernSmooth_2.23-17         
[117] plotly_4.9.2.1              locfit_1.5-9.4             
[119] grid_4.0.0                  data.table_1.12.8          
[121] digest_0.6.25               tidyr_1.1.0                
[123] stats4_4.0.0                munsell_0.5.0              
[125] viridisLite_0.3.0 

Where do you think the problem might be?

Thanks in advance!

GeorgescuC commented 4 years ago

Hi @ccruizm ,

I am not sure why there would be NA/NaN/Inf at this step in the process. Does the preliminary plot look normal? Could you try rerunning things from the start by either using resume_mode=FALSE or emptying the output folder? If the error still occurs, I will probably need to debug things using the data.

Regards, Christophe.

mmfalco commented 3 years ago

I'm having same problem here with version 1.7.1 of the package. Strangely it was solved when using a base::matrix object class instead of the sparse Matrix class. So when creating the infercnv object I did:

infercnv_obj = CreateInfercnvObject(as.matrix(counts_matrix),
                                    annotations_file="cellannot.txt",
                                    delim="\t",
                                    gene_order_file="genePos.csv",
                                    ref_group_names=c("stromal","immune"))

And it worked.

I think this has to do with the problems I've been having with the " cluster_by_groups=F" argument in the infercnv::run() function.

MasonDou commented 11 months ago

resume_mode=FALSE

Hi @GeorgescuC , I meet same problem in step 17 and same error information. The weird part is when I use "out_dir= tempfile()" the function runs well, however, when I put a folder name, the bug just appear. Looking forward to your reply, thank you!

brianjohnhaas commented 11 months ago

Hi all - we've had a lapse in funding towards infercnv and have limited resources for tech support. Hopefully that changes, but in the meantime, we don't have resources to provide tech support.

We'll put up a banner about this sometime soon. In the meantime, hopefully users can help each other out.

On Thu, Nov 9, 2023 at 11:25 AM MasonDou @.***> wrote:

resume_mode=FALSE

Hi @GeorgescuC https://github.com/GeorgescuC , I meet same problem in step 17 and same error information. The weird part is when I use "out_dir= tempfile()" the function runs well, however, when I put a folder name, the bug just appear. Looking forward to your reply, thank you!

— Reply to this email directly, view it on GitHub https://github.com/broadinstitute/infercnv/issues/242#issuecomment-1804150954, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX4JT7OZ5EFRDSNVCH3YDT7X5AVCNFSM4ODXM5MKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBQGQYTKMBZGU2A . You are receiving this because you are subscribed to this thread.Message ID: @.***>

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas