ShixiangWang / sigminer

🌲 An easy-to-use and scalable toolkit for genomic alteration signature (a.k.a. mutational signature) analysis and visualization in R https://shixiangwang.github.io/sigminer/reference/index.html
https://shixiangwang.github.io/sigminer/
Other
141 stars 18 forks source link

Making sig_tally output a matrix when run with a single sample in ID / DBS / ALL mode (like SBS already does) #453

Closed selkamand closed 6 months ago

selkamand commented 6 months ago

Hi again,

This issue is a an extension of https://github.com/ShixiangWang/sigminer/discussions/432. Requesting the same fix be applied to DBS/INDEL/ALL models of ig_tally

Basically when running sig_tally in Indel or DBS mode with a single sample, numeric vectors are output instead of a matrix with sample names as rownames. SBS mode works as expected having been previously fixed. See reprex below.

# Libraries
library(sigminer)
#> sigminer version 2.3.0
#> - Star me at https://github.com/ShixiangWang/sigminer
#> - Run hello() to see usage and citation.
library(maftools)
#> 
#> Attaching package: 'maftools'
#> The following object is masked from 'package:sigminer':
#> 
#>     MAF

# Load MAF
maf_acc <- tcgaLoad(study = "ACC")
#> Loading ACC. Please cite: https://doi.org/10.1016/j.cels.2018.03.002 for reference

# 1-sample MAF
maf_acc_1sample <- maftools::subsetMaf(maf_acc, tsb = 'TCGA-OR-A5KB-01A-11D-A30A-10')
#> --Possible FLAGS among top ten genes:
#>   TTN
#>   USH2A
#> -Processing clinical data

# 2-sample MAF
maf_acc_2sample <- maftools::subsetMaf(maf_acc, tsb = c('TCGA-OR-A5KB-01A-11D-A30A-10', 'TCGA-OR-A5J2-01A-11D-A29I-10'))
#> --Possible FLAGS among top ten genes:
#>   TTN
#>   USH2A
#> -Processing clinical data

# Tally on both the 1-sample and 2-sample MAFs
maf_acc_1sample_all_tally <- sig_tally(object = maf_acc_1sample, mode = c("ALL"))
#> ℹ [2024-03-07 20:51:49.603816]: Started.
#> ℹ [2024-03-07 20:51:56.612782]: We would assume you marked all variants' position in + strand.
#> 
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:utils':
#> 
#>     findMatches
#> The following objects are masked from 'package:base':
#> 
#>     expand.grid, I, unname
#> 
#> Attaching package: 'Biostrings'
#> The following object is masked from 'package:base':
#> 
#>     strsplit
#> 
#> Attaching package: 'rtracklayer'
#> The following object is masked from 'package:BiocIO':
#> 
#>     FileForFormat
#> ✔ [2024-03-07 20:51:56.757174]: Reference genome loaded.
#> ✔ [2024-03-07 20:51:56.765047]: Variants from MAF object queried.
#> ✔ [2024-03-07 20:51:56.769363]: Chromosome names checked.
#> ✔ [2024-03-07 20:51:56.777337]: Sex chromosomes properly handled.
#> ✔ [2024-03-07 20:51:56.779657]: Only variants located in standard chromosomes (1:22, X, Y, M/MT) are kept.
#> ✔ [2024-03-07 20:51:56.786026]: Variant start and end position checked.
#> ✔ [2024-03-07 20:51:56.791359]: Variant data for matrix generation preprocessed.
#> ℹ [2024-03-07 20:51:56.793258]: All types of matrices generation - start.
#> ℹ [2024-03-07 20:51:56.796281]: SBS matrix generation - start.
#> ℹ [2024-03-07 20:51:56.799234]: Extracting 5' and 3' adjacent bases.
#> ℹ [2024-03-07 20:51:57.428589]: Extracting +/- 20bp around mutated bases for background C>T estimation.
#> ℹ [2024-03-07 20:51:57.94948]: Estimating APOBEC enrichment scores.
#> ℹ [2024-03-07 20:51:57.951843]: Performing one-way Fisher's test for APOBEC enrichment.
#> ✔ [2024-03-07 20:51:57.95955]: APOBEC related mutations are enriched in 0% of samples (APOBEC enrichment score > 2; 0 of 1 samples)
#> ℹ [2024-03-07 20:51:57.964209]: Creating SBS sample-by-component matrices.
#> ✔ [2024-03-07 20:51:57.970562]: SBS-6 matrix created.
#> ✔ [2024-03-07 20:51:57.9768]: SBS-96 matrix created.
#> ✔ [2024-03-07 20:51:57.994874]: SBS-1536 matrix created.
#> ℹ [2024-03-07 20:51:57.997067]: Return SBS-96 as major matrix.
#> ℹ [2024-03-07 20:51:58.002714]: DBS matrix generation - start.
#> ℹ [2024-03-07 20:51:58.005185]: Searching DBS records...
#> ✔ [2024-03-07 20:51:58.046337]: Done.
#> ✔ [2024-03-07 20:51:58.686095]: Reference sequences queried from genome.
#> ✔ [2024-03-07 20:51:58.697174]: DBS-78 matrix created.
#> ✔ [2024-03-07 20:51:58.710347]: DBS-1248 matrix created.
#> ℹ [2024-03-07 20:51:58.712434]: Return SBS-78 as major matrix.
#> ℹ [2024-03-07 20:51:58.717723]: INDEL matrix generation - start.
#> ✔ [2024-03-07 20:51:59.479262]: Reference sequences queried from genome.
#> ✔ [2024-03-07 20:51:59.481283]: INDEL length extracted.
#> ✔ [2024-03-07 20:51:59.486269]: Adjacent copies counted.
#> ✔ [2024-03-07 20:51:59.488714]: Microhomology size calculated.
#> ✔ [2024-03-07 20:51:59.500194]: INDEL records classified into different components (types).
#> ✔ [2024-03-07 20:51:59.504986]: ID-28 matrix created.
#> ✔ [2024-03-07 20:51:59.510807]: ID-83 matrix created.
#> ℹ [2024-03-07 20:51:59.512781]: Return ID-83 as major matrix.
#> ℹ [2024-03-07 20:51:59.514775]: All types of matrices generation (APOBEC scores included) - end.
#> ✔ [2024-03-07 20:51:59.516668]: Done.
#> ℹ [2024-03-07 20:51:59.518685]: 9.915 secs elapsed.
maf_acc_2sample_all_tally <- sig_tally(object = maf_acc_2sample, mode = c("ALL"))
#> ℹ [2024-03-07 20:51:59.521485]: Started.
#> ℹ [2024-03-07 20:51:59.527054]: We would assume you marked all variants' position in + strand.
#> ✔ [2024-03-07 20:51:59.528964]: Reference genome loaded.
#> ✔ [2024-03-07 20:51:59.534562]: Variants from MAF object queried.
#> ✔ [2024-03-07 20:51:59.5382]: Chromosome names checked.
#> ✔ [2024-03-07 20:51:59.545192]: Sex chromosomes properly handled.
#> ✔ [2024-03-07 20:51:59.547089]: Only variants located in standard chromosomes (1:22, X, Y, M/MT) are kept.
#> ✔ [2024-03-07 20:51:59.552945]: Variant start and end position checked.
#> ✔ [2024-03-07 20:51:59.557316]: Variant data for matrix generation preprocessed.
#> ℹ [2024-03-07 20:51:59.558859]: All types of matrices generation - start.
#> ℹ [2024-03-07 20:51:59.560365]: SBS matrix generation - start.
#> ℹ [2024-03-07 20:51:59.562431]: Extracting 5' and 3' adjacent bases.
#> ℹ [2024-03-07 20:51:59.916994]: Extracting +/- 20bp around mutated bases for background C>T estimation.
#> ℹ [2024-03-07 20:52:00.373613]: Estimating APOBEC enrichment scores.
#> ℹ [2024-03-07 20:52:00.375485]: Performing one-way Fisher's test for APOBEC enrichment.
#> ✔ [2024-03-07 20:52:00.380256]: APOBEC related mutations are enriched in 0% of samples (APOBEC enrichment score > 2; 0 of 2 samples)
#> ℹ [2024-03-07 20:52:00.383674]: Creating SBS sample-by-component matrices.
#> ✔ [2024-03-07 20:52:00.388254]: SBS-6 matrix created.
#> ✔ [2024-03-07 20:52:00.393267]: SBS-96 matrix created.
#> ✔ [2024-03-07 20:52:00.405369]: SBS-1536 matrix created.
#> ℹ [2024-03-07 20:52:00.406639]: Return SBS-96 as major matrix.
#> ℹ [2024-03-07 20:52:00.410594]: DBS matrix generation - start.
#> ℹ [2024-03-07 20:52:00.412634]: Searching DBS records...
#> ✔ [2024-03-07 20:52:00.443106]: Done.
#> ✔ [2024-03-07 20:52:01.621479]: Reference sequences queried from genome.
#> ✔ [2024-03-07 20:52:01.63544]: DBS-78 matrix created.
#> ✔ [2024-03-07 20:52:01.648456]: DBS-1248 matrix created.
#> ℹ [2024-03-07 20:52:01.650305]: Return SBS-78 as major matrix.
#> ℹ [2024-03-07 20:52:01.655035]: INDEL matrix generation - start.
#> ✔ [2024-03-07 20:52:02.252272]: Reference sequences queried from genome.
#> ✔ [2024-03-07 20:52:02.254433]: INDEL length extracted.
#> ✔ [2024-03-07 20:52:02.25843]: Adjacent copies counted.
#> ✔ [2024-03-07 20:52:02.26031]: Microhomology size calculated.
#> ✔ [2024-03-07 20:52:02.274792]: INDEL records classified into different components (types).
#> ✔ [2024-03-07 20:52:02.279745]: ID-28 matrix created.
#> ✔ [2024-03-07 20:52:02.284801]: ID-83 matrix created.
#> ℹ [2024-03-07 20:52:02.286685]: Return ID-83 as major matrix.
#> ℹ [2024-03-07 20:52:02.288518]: All types of matrices generation (APOBEC scores included) - end.
#> ✔ [2024-03-07 20:52:02.290029]: Done.
#> ℹ [2024-03-07 20:52:02.291569]: 2.77 secs elapsed.

# Print Matrices (SBS): Works as expected (matrix)
maf_acc_1sample_all_tally$SBS_96 |> class()
#> [1] "matrix" "array"
maf_acc_2sample_all_tally$SBS_96 |> class()
#> [1] "matrix" "array"

# Print Matrices (INDEL): 1 sample version produces numeric vector NOT an matrix with sample names as rownames
maf_acc_1sample_all_tally$ID_83|> class()
#> [1] "integer"
maf_acc_2sample_all_tally$ID_83 |> class()
#> [1] "matrix" "array"

# Print Matrices (DBS): 1 sample version produces numeric vector NOT an matrix with sample names as rownames
maf_acc_1sample_all_tally$DBS_78|> class()
#> [1] "integer"
maf_acc_2sample_all_tally$DBS_78 |> class()
#> [1] "matrix" "array"

# SessionInfo
sessionInfo()
#> R version 4.3.2 (2023-10-31)
#> Platform: x86_64-apple-darwin20 (64-bit)
#> Running under: macOS Ventura 13.5.2
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
#> 
#> locale:
#> [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
#> 
#> time zone: Australia/Sydney
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] BSgenome.Hsapiens.UCSC.hg19_1.4.3 BSgenome_1.70.1                  
#>  [3] rtracklayer_1.62.0                BiocIO_1.12.0                    
#>  [5] Biostrings_2.70.2                 XVector_0.42.0                   
#>  [7] GenomicRanges_1.54.1              GenomeInfoDb_1.38.5              
#>  [9] IRanges_2.36.0                    S4Vectors_0.40.2                 
#> [11] maftools_2.18.0                   sigminer_2.3.0                   
#> [13] Biobase_2.62.0                    BiocGenerics_0.48.1              
#> 
#> loaded via a namespace (and not attached):
#>  [1] tidyselect_1.2.0            gridBase_0.4-7             
#>  [3] dplyr_1.1.4                 R.utils_2.12.3             
#>  [5] bitops_1.0-7                fastmap_1.1.1              
#>  [7] RCurl_1.98-1.14             GenomicAlignments_1.38.2   
#>  [9] XML_3.99-0.16.1             reprex_2.1.0               
#> [11] digest_0.6.34               lifecycle_1.0.4            
#> [13] cluster_2.1.6               survival_3.5-7             
#> [15] magrittr_2.0.3              compiler_4.3.2             
#> [17] rlang_1.1.3                 rngtools_1.5.2             
#> [19] tools_4.3.2                 utf8_1.2.4                 
#> [21] yaml_2.3.8                  data.table_1.15.0          
#> [23] NMF_0.26                    knitr_1.45                 
#> [25] S4Arrays_1.2.0              DelayedArray_0.28.0        
#> [27] plyr_1.8.9                  RColorBrewer_1.1-3         
#> [29] abind_1.4-5                 BiocParallel_1.36.0        
#> [31] registry_0.5-1              R.cache_0.16.0             
#> [33] withr_3.0.0                 purrr_1.0.2                
#> [35] R.oo_1.26.0                 grid_4.3.2                 
#> [37] fansi_1.0.6                 colorspace_2.1-0           
#> [39] future_1.33.1               ggplot2_3.5.0              
#> [41] globals_0.16.2              scales_1.3.0               
#> [43] iterators_1.0.14            SummarizedExperiment_1.32.0
#> [45] cli_3.6.2                   crayon_1.5.2               
#> [47] rmarkdown_2.25              generics_0.1.3             
#> [49] rstudioapi_0.15.0           rjson_0.2.21               
#> [51] reshape2_1.4.4              DNAcopy_1.76.0             
#> [53] stringr_1.5.1               zlibbioc_1.48.0            
#> [55] splines_4.3.2               parallel_4.3.2             
#> [57] restfulr_0.0.15             BiocManager_1.30.22        
#> [59] matrixStats_1.2.0           vctrs_0.6.5                
#> [61] Matrix_1.6-5                listenv_0.9.1              
#> [63] foreach_1.5.2               glue_1.7.0                 
#> [65] parallelly_1.36.0           codetools_0.2-19           
#> [67] stringi_1.8.3               gtable_0.3.4               
#> [69] munsell_0.5.0               tibble_3.2.1               
#> [71] styler_1.10.2               furrr_0.3.1                
#> [73] pillar_1.9.0                htmltools_0.5.7            
#> [75] GenomeInfoDbData_1.2.11     R6_2.5.1                   
#> [77] doParallel_1.0.17           evaluate_0.23              
#> [79] lattice_0.22-5              Rsamtools_2.18.0           
#> [81] R.methodsS3_1.8.2           Rcpp_1.0.12                
#> [83] SparseArray_1.2.3           xfun_0.41                  
#> [85] MatrixGenerics_1.14.0       fs_1.6.3                   
#> [87] pkgconfig_2.0.3

Created on 2024-03-07 with reprex v2.1.0

Thanks for your time!!

ShixiangWang commented 6 months ago

@selkamand Thank you for your reproducible bug report. I will take a deep look and fix it in the near future days.

ShixiangWang commented 6 months ago

@selkamand This should be fixed, please install the latest version from GitHub and take a try.

remotes::install_github("ShixiangWang/sigminer")

Thanks for your report again.

selkamand commented 6 months ago

Sorry for the delayed response. Works perfectly now. Thanks!