Error Encountered during Step 17: Error in subclusters_per_chr[[chr]] : subscript out of bounds

kiklata commented 2 years ago

Hellow, I am using the newest pull from the master branch of infercnv. I'm trying to use infercnv in a 10X dataset with 5000 normal cells and 200 tumor cells. I‘m getting the following error at step 17

STEP 17: HMM-based CNV prediction

INFO [2022-08-26 08:58:01] predict_CNV_via_HMM_on_tumor_subclusters_per_chr
Error in subclusters_per_chr[[chr]] : subscript out of bounds

here's my code and rsession info

infercnv_obj = infercnv::run(infercnv_obj,
                               cutoff=0.1, 
                               out_dir= paste0(sample.n[i]), no_prelim_plot = T,
                               num_threads = 16,
                               cluster_by_groups=F, 
                               denoise=TRUE,
                               HMM=TRUE,HMM_type = 'i6',HMM_transition_prob = 1e-06,
                               analysis_mode = 'subclusters',HMM_report_by = 'subcluster')

R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] infercnv_1.13.0    dplyr_1.0.9        sp_1.5-0           SeuratObject_4.1.0 Seurat_4.1.1      

loaded via a namespace (and not attached):
  [1] parallelDist_0.2.6          plyr_1.8.7                  igraph_1.3.4               
  [4] lazyeval_0.2.2              splines_4.2.1               listenv_0.8.0              
  [7] scattermore_0.8             TH.data_1.1-1               GenomeInfoDb_1.32.2        
 [10] ggplot2_3.3.6               digest_0.6.29               foreach_1.5.2              
 [13] htmltools_0.5.3             HiddenMarkov_1.8-13         fansi_1.0.3                
 [16] magrittr_2.0.3              tensor_1.5                  cluster_2.1.3              
 [19] doParallel_1.0.17           ROCR_1.0-11                 limma_3.52.2               
 [22] fastcluster_1.2.3           globals_0.15.1              RcppParallel_5.1.5         
 [25] matrixStats_0.62.0          sandwich_3.0-2              spatstat.sparse_2.1-1      
 [28] colorspace_2.0-3            ggrepel_0.9.1               crayon_1.5.1               
 [31] RCurl_1.98-1.8              jsonlite_1.8.0              libcoin_1.0-9              
 [34] progressr_0.10.1            spatstat.data_2.2-0         survival_3.2-13            
 [37] zoo_1.8-10                  iterators_1.0.14            ape_5.6-2                  
 [40] glue_1.6.2                  polyclip_1.10-0             gtable_0.3.0               
 [43] zlibbioc_1.42.0             XVector_0.36.0              leiden_0.4.2               
 [46] DelayedArray_0.22.0         future.apply_1.9.0          SingleCellExperiment_1.18.0
 [49] BiocGenerics_0.42.0         abind_1.4-5                 scales_1.2.0               
 [52] futile.options_1.0.1        mvtnorm_1.1-3               edgeR_3.38.1               
 [55] DBI_1.1.3                   spatstat.random_2.2-0       miniUI_0.1.1.1             
 [58] Rcpp_1.0.9                  viridisLite_0.4.0           xtable_1.8-4               
 [61] reticulate_1.25             spatstat.core_2.4-4         stats4_4.2.1               
 [64] htmlwidgets_1.5.4           httr_1.4.3                  gplots_3.1.3               
 [67] RColorBrewer_1.1-3          modeltools_0.2-23           ellipsis_0.3.2             
 [70] ica_1.0-3                   pkgconfig_2.0.3             uwot_0.1.11                
 [73] deldir_1.0-6                locfit_1.5-9.6              utf8_1.2.2                 
 [76] tidyselect_1.1.2            rlang_1.0.4                 reshape2_1.4.4             
 [79] later_1.3.0                 phyclust_0.1-30             munsell_0.5.0              
 [82] tools_4.2.1                 cli_3.3.0                   generics_0.1.3             
 [85] ggridges_0.5.3              stringr_1.4.0               fastmap_1.1.0              
 [88] argparse_2.1.5              goftest_1.2-3               fitdistrplus_1.1-8         
 [91] caTools_1.18.2              purrr_0.3.4                 RANN_2.6.1                 
 [94] coin_1.4-2                  pbapply_1.5-0               future_1.27.0              
 [97] nlme_3.1-157                mime_0.12                   formatR_1.12               
[100] compiler_4.2.1              rstudioapi_0.13             plotly_4.10.0              
[103] png_0.1-7                   spatstat.utils_2.3-1        tibble_3.1.8               
[106] stringi_1.7.8               futile.logger_1.4.3         rgeos_0.5-9                
[109] lattice_0.20-45             Matrix_1.4-1                vctrs_0.4.1                
[112] pillar_1.8.0                lifecycle_1.0.1             spatstat.geom_2.4-0        
[115] lmtest_0.9-40               RcppAnnoy_0.0.19            data.table_1.14.2          
[118] cowplot_1.1.1               bitops_1.0-7                irlba_2.3.5                
[121] httpuv_1.6.5                patchwork_1.1.1             GenomicRanges_1.48.0       
[124] R6_2.5.1                    promises_1.2.0.1            KernSmooth_2.23-20         
[127] gridExtra_2.3               rjags_4-13                  IRanges_2.30.0             
[130] parallelly_1.32.1           codetools_0.2-18            lambda.r_1.2.4             
[133] gtools_3.9.3                MASS_7.3-58                 assertthat_0.2.1           
[136] SummarizedExperiment_1.26.1 sctransform_0.3.3           multcomp_1.4-19            
[139] S4Vectors_0.34.0            GenomeInfoDbData_1.2.8      mgcv_1.8-40                
[142] parallel_4.2.1              grid_4.2.1                  rpart_4.1.16               
[145] tidyr_1.2.0                 coda_0.19-4                 MatrixGenerics_1.8.1       
[148] Rtsne_0.16                  Biobase_2.56.0              shiny_1.7.2

Sa753 commented 2 years ago

I have the same problem as well. Did you get to solve it?

Just to add that this error happens only if running the version installed from the master branch (which runs PCA in step 15). it will also occur if you run it on the example data that is already in the package.

kiklata commented 2 years ago

I have the same problem as well. Did you get to solve it?

Just to add that this error happens only if running the version installed from the master branch (which runs PCA in step 15). it will also occur if you run it on the example data that is already in the package.

Hi @Sa753

Since I'm not interested in subclonal structure, I decided not to use HMM model, and it works fine.

GeorgescuC commented 2 years ago

Hi @kiklata @Sa753 ,

I am unable to reproduce the issue with the example data. Could you try pulling the new commits and reinstalling to see if the issue persists?

Regards, Christophe.

Sa753 commented 2 years ago

Hi Chris,

How can I do this?. I installed the version on the master branch?. is that the same?

Thanks

kiklata commented 2 years ago

Hi @GeorgescuC ,

Re-run the code mentioned before using the version 1.13.0 resulted in the same error.

Demo data was provided in download

Many thanks

Sa753 commented 2 years ago

Hi Chris,

I reinstalled twice using this code

devtools::install_github("broadinstitute/infercnv", ref="master")

The installed version is 1.13 and R version is 4.1.3. The error happened only with this version but not with the previous versions also it happened with the example data Thx

Sa753 commented 2 years ago

Is there any update on this please?

mhgh146 commented 2 years ago

I've run into the exact same issue as well, would appreciate any update?

GeorgescuC commented 2 years ago

Hi @kiklata ,

For the data you provided, besides the error encountered, the cutoff used seems too stringent as very few genes remain after filtering. Is this single nuclei data? Looking at the average number of reads per gene over the data, something closer to 0.01 should be used as cutoff. On top of this, the data looks rather noisy (probably because of the lower read counts), so the leiden subclustering generates too many small clusters (down to one cell per subcluster). For this, you could decrease the leiden_resolution value to something around 0.01 (can check igraph::cluster_leiden for more details on the setting). A valid option since you seem to have limited subpopulations diversity would also be to set k_obs_groups=3 and up_to_step=15, then transfer the split groups (they are normally only used during plotting) as annotations and rerun the analysis with analysis_mode="samples" and cluster_by_groups=T (you can at this point remove k_obs_groups).

Hi @Sa753 @mhgh146 ,

For the issue itself, the problem was that for some chromosomes, due to a combination of the initial filtering of genes at the cutoff and the z-score filtering for genes that are variable in references during subclustering, the per chromosome subclustering was missing for those chromosomes (as there was no data to run on). The case is now handled to have all cells be a single cluster, but it likely points to an issue in the options used, earlier filtering or data quality.

Regards, Christophe.

kiklata commented 2 years ago

Great thx @GeorgescuC, i'll try to custom parameters for specific data and update further

broadinstitute / infercnv

Error Encountered during Step 17: Error in subclusters_per_chr[[chr]] : subscript out of bounds #449