dozmorovlab / SpectralTAD

TAD Calling using spectral clustering in R
https://dozmorovlab.github.io/SpectralTAD/
Other
8 stars 3 forks source link

Error for certain combinations of min_size and window_size #8

Closed isaLa42 closed 1 year ago

isaLa42 commented 1 year ago

Hi,

first of all thanks for this nice tool and especially the clear parameters and explanations.

I am trying to call small substructures in my data and therefor played around with min_size and window_size for a bit. I noticed that some combinations work well on some chromosomes but not for others with my data. It occurs very random and also different combinations work for different chr.

Is there a logic behind setting min_size, window_size and chr_size that I do not get at the moment?

I was able to replicate the error with the rao_chr20_25_rep test data. This is my SpectralTAD call:

SpectralTAD(rao_chr20_25_rep, grange = TRUE,
                           chr = "chr20", resolution = 25000, 
                           levels = 3, min_size = 2, window_size = 20,
                           qual_filter = FALSE, z_clust = FALSE)

This is the error message: (I get the exact same for some but not all chr of my data)

Converting to n x n matrix

Matrix dimensions: 2517x2517

Warning message:
“Unknown or uninitialised column: `Group`.”
Warning message:
“Unknown or uninitialised column: `Group`.”
Error in `vec_init()`:
! `x` must be a vector, not `NULL`.
Traceback:

1. SpectralTAD(rao_chr20_25_rep, grange = TRUE, chr = "chr20", resolution = 25000, 
 .     levels = 3, min_size = 2, window_size = 20, qual_filter = FALSE, 
 .     z_clust = FALSE)
2. .windowedSpec(cont_mat, chr = chr, resolution = resolution, z_clust = z_clust, 
 .     eigenvalues = eigenvalues, min_size = min_size, window_size = window_size, 
 .     qual_filter = qual_filter, gap_threshold = gap_threshold) %>% 
 .     mutate(Level = 1)
3. mutate(., Level = 1)
4. .windowedSpec(cont_mat, chr = chr, resolution = resolution, z_clust = z_clust, 
 .     eigenvalues = eigenvalues, min_size = min_size, window_size = window_size, 
 .     qual_filter = qual_filter, gap_threshold = gap_threshold)
5. which(end_group$Group == last(end_group$Group))
6. last(end_group$Group)
7. nth(x, -1L, order_by = order_by, default = default, na_rm = na_rm)
8. check_nth_default(default, x = x)
9. vec_init(x)
10. stop_scalar_type(.Primitive("quote")(NULL), "x", .Primitive("quote")(vec_init()))
11. stop_vctrs(msg, "vctrs_error_scalar_type", actual = x, call = call)
12. abort(message, class = c(class, "vctrs_error"), ..., call = call)
13. signal_abort(cnd, .file)`

Any hints on what is going wrong? Thanks a lot in advance!

My session info is quite extensive - sorry for that:

R version 4.2.2 (2022-10-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /home/bq_ilander/miniconda3/envs/RWireX_v0.2.02/lib/libopenblasp-r0.3.21.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] SpectralTAD_1.14.0                plotgardener_1.4.1               
 [3] qs_0.25.4                         BSgenome.Hsapiens.UCSC.hg38_1.4.5
 [5] BSgenome_1.66.3                   rtracklayer_1.58.0               
 [7] Biostrings_2.66.0                 XVector_0.38.0                   
 [9] GenomicRanges_1.50.0              GenomeInfoDb_1.34.9              
[11] IRanges_2.32.0                    S4Vectors_0.36.0                 
[13] BiocGenerics_0.44.0               ggpubr_0.4.0                     
[15] ggplot2_3.4.2                     magrittr_2.0.3                   

loaded via a namespace (and not attached):
  [1] uuid_1.1-0                              
  [2] backports_1.4.1                         
  [3] BiocFileCache_2.6.0                     
  [4] plyr_1.8.8                              
  [5] repr_1.1.6                              
  [6] splines_4.2.2                           
  [7] RApiSerialize_0.1.2                     
  [8] BiocParallel_1.32.5                     
  [9] listenv_0.9.0                           
 [10] CGHcall_2.60.0                          
 [11] digest_0.6.31                           
 [12] yulab.utils_0.0.6                       
 [13] htmltools_0.5.5                         
 [14] fansi_1.0.4                             
 [15] memoise_2.0.1                           
 [16] cluster_2.1.4                           
 [17] InteractionSet_1.26.0                   
 [18] limma_3.54.0                            
 [19] globals_0.16.2                          
 [20] RcppParallel_5.1.6                      
 [21] matrixStats_1.0.0                       
 [22] R.utils_2.12.2                          
 [23] prettyunits_1.1.1                       
 [24] colorspace_2.1-0                        
 [25] rappdirs_0.3.3                          
 [26] blob_1.2.4                              
 [27] dplyr_1.1.2                             
 [28] crayon_1.5.2                            
 [29] RCurl_1.98-1.12                         
 [30] jsonlite_1.8.4                          
 [31] stringfish_0.15.7                       
 [32] impute_1.72.0                           
 [33] glue_1.6.2                              
 [34] gtable_0.3.3                            
 [35] zlibbioc_1.44.0                         
 [36] strawr_0.0.91                           
 [37] DelayedArray_0.24.0                     
 [38] TxDb.Hsapiens.UCSC.hg38.knownGene_3.16.0
 [39] car_3.1-2                               
 [40] plyranges_1.18.0                        
 [41] Rhdf5lib_1.20.0                         
 [42] future.apply_1.11.0                     
 [43] HiCcompare_1.20.0                       
 [44] abind_1.4-5                             
 [45] scales_1.2.1                            
 [46] pheatmap_1.0.12                         
 [47] DBI_1.1.3                               
 [48] CGHbase_1.58.0                          
 [49] rstatix_0.7.2                           
 [50] Rcpp_1.0.9                              
 [51] progress_1.2.2                          
 [52] gridGraphics_0.5-1                      
 [53] bit_4.0.5                               
 [54] httr_1.4.6                              
 [55] RColorBrewer_1.1-3                      
 [56] pkgconfig_2.0.3                         
 [57] XML_3.99-0.14                           
 [58] R.methodsS3_1.8.2                       
 [59] dbplyr_2.3.2                            
 [60] utf8_1.2.3                              
 [61] DNAcopy_1.72.0                          
 [62] ggplotify_0.1.0                         
 [63] tidyselect_1.2.0                        
 [64] rlang_1.1.1                             
 [65] AnnotationDbi_1.60.0                    
 [66] munsell_0.5.0                           
 [67] tools_4.2.2                             
 [68] cachem_1.0.8                            
 [69] cli_3.6.1                               
 [70] generics_0.1.3                          
 [71] RSQLite_2.3.1                           
 [72] ArchR_1.0.3                             
 [73] broom_1.0.4                             
 [74] stringr_1.5.0                           
 [75] evaluate_0.21                           
 [76] fastmap_1.1.1                           
 [77] yaml_2.3.7                              
 [78] org.Hs.eg.db_3.16.0                     
 [79] bit64_4.0.5                             
 [80] purrr_1.0.1                             
 [81] KEGGREST_1.38.0                         
 [82] future_1.32.0                           
 [83] nlme_3.1-162                            
 [84] R.oo_1.25.0                             
 [85] xml2_1.3.3                              
 [86] biomaRt_2.54.0                          
 [87] compiler_4.2.2                          
 [88] filelock_1.0.2                          
 [89] curl_4.3.3                              
 [90] png_0.1-8                               
 [91] ggsignif_0.6.4                          
 [92] marray_1.76.0                           
 [93] tibble_3.2.1                            
 [94] stringi_1.7.12                          
 [95] QDNAseq_1.34.0                          
 [96] GenomicFeatures_1.50.2                  
 [97] lattice_0.21-8                          
 [98] IRdisplay_1.1                           
 [99] Matrix_1.5-4.1                          
[100] vctrs_0.6.2                             
[101] pillar_1.9.0                            
[102] lifecycle_1.0.3                         
[103] rhdf5filters_1.10.0                     
[104] data.table_1.14.8                       
[105] bitops_1.0-7                            
[106] PRIMME_3.2-4                            
[107] R6_2.5.1                                
[108] BiocIO_1.8.0                            
[109] KernSmooth_2.23-21                      
[110] gridExtra_2.3                           
[111] parallelly_1.36.0                       
[112] codetools_0.2-19                        
[113] gtools_3.9.4                            
[114] rhdf5_2.42.0                            
[115] SummarizedExperiment_1.28.0             
[116] rjson_0.2.21                            
[117] withr_2.5.0                             
[118] GenomicAlignments_1.34.0                
[119] Rsamtools_2.14.0                        
[120] GenomeInfoDbData_1.2.9                  
[121] mgcv_1.8-42                             
[122] parallel_4.2.2                          
[123] hms_1.1.3                               
[124] grid_4.2.2                              
[125] IRkernel_1.3.1                          
[126] tidyr_1.3.0                             
[127] MatrixGenerics_1.10.0                   
[128] carData_3.0-5                           
[129] pbdZMQ_0.3-9                            
[130] Biobase_2.58.0                          
[131] base64enc_0.1-3                         
[132] restfulr_0.0.15
mdozmorov commented 1 year ago

Interesting, the same command works as intended on my Mac. Scanning through sessionInfo, can only think about potential LAPACK library mismatches. It would be hard to troubleshoot. Does the error occur if you run it on a different (Mac or Windows) machine?

As for the parameters, SpectralTAD is fairly robust. We use levels = 1 (first level TADs are most robust), qual_filter = TRUE and z_clust = TRUE (it may help to eliminate poorly formed TADs in case of low-resolution data), gap_threshold = 0.8 (also helps to eliminate sparse rows/columns to be called as boundaries). As for the window size:

resolution <- 10000
max_tad_size <- 2000000; window_size <- max_tad_size / resolution

We typically call TADs at several resolutions, like 10K, 20K, 50K. If you visualize all of them in IGV, you'll see they mostly overlap. We either choose TADs detected at the lowest resolution or combine TAD calls from all resolutions, depending on visual assessment.

isaLa42 commented 1 year ago

Thanks for the fast reply.

I am running SpectralTAD in a conda environment on a Unix computer cluster. I will try reinstallation in a new conda environment. Which LAPACK version do you recommend to resolve the mismatches?

Thanks for the parameter explanations. Where can I find/set the max_tad_size parameter? I only find min_size and window_size in the code here on github.

mdozmorov commented 1 year ago

I'd try conda install -c anaconda openblas and conda install -c conda-forge lapack.

Maximum TAD size is common knowledge - TADs larger than 2Mb typically don't exist. Hence, max_tad_size <- 2000000.

crazy-pancake commented 1 year ago

Hey there!

Sorry, in my previous comment, I addressed the issue that you solved a few months ago. But your documentation tells to install it from different project location - cresswellkg, that's why you do not get the error, but we do.

devtools::install_github('cresswellkg/SpectralTAD', build_vignettes = TRUE)
library(SpectralTAD)

It still contains the version before the fixes. So it should be devtools::install_github('dozmorovlab/SpectralTAD')

mdozmorov commented 1 year ago

Thanks, @crazy-pancake, that change was long overdue. I corrected it in the README and pushed updates to Bioconductor. @isaLa42, did it help to solve your error?

isaLa42 commented 1 year ago

Yes, the newer version solved it. Thanks a lot @crazy-pancake