ShixiangWang / sigminer

🌲 An easy-to-use and scalable toolkit for genomic alteration signature (a.k.a. mutational signature) analysis and visualization in R https://shixiangwang.github.io/sigminer/reference/index.html
https://shixiangwang.github.io/sigminer/
Other
141 stars 18 forks source link

'Matrices' returned by sig_tally are actually just vectors when only 1 sample is supplied #450

Closed selkamand closed 7 months ago

selkamand commented 7 months ago

Thanks for your work on this great package!

Just noticed that when a single sample MAF is supplied to sig_tally, the 'matrices' returned are not actually matrices at all, but rather are just vectors. This is inconsistent with behaviour when a multi-sample MAF is supplied (which conveniently returns a matrix with 1 row per sample and sample IDs as rownames).

Would it be possible to unify the return type irrespective of the number of samples supplied to sig_tally? (i.e. return 1-row matrices with rowname = sample identifier when single-sample MAFs are analysed)

The consistent format would make downstream analyses simpler.

Reprex is attached below

# Libraries
library(sigminer)
#> sigminer version 2.2.0
#> - Star me at https://github.com/ShixiangWang/sigminer
#> - Run hello() to see usage and citation.
library(maftools)
#> 
#> Attaching package: 'maftools'
#> The following object is masked from 'package:sigminer':
#> 
#>     MAF

# Load MAF
maf_acc <- tcgaLoad(study = "ACC")
#> Loading ACC. Please cite: https://doi.org/10.1016/j.cels.2018.03.002 for reference

# 1-sample MAF
maf_acc_1sample <- maftools::subsetMaf(maf_acc, tsb = 'TCGA-OR-A5J1-01A-11D-A29I-10')
#> -Processing clinical data

# 2-sample MAF
maf_acc_2sample <- maftools::subsetMaf(maf_acc, tsb = c('TCGA-OR-A5J1-01A-11D-A29I-10', 'TCGA-OR-A5J2-01A-11D-A29I-10'))
#> -Processing clinical data

# Tally on both the 1-sample and 2-sample MAFs
maf_acc_1sample_tally <- sig_tally(object = maf_acc_1sample, mode = c("SBS", "DBS", "ID", "ALL"))
#> ℹ [2024-01-29 18:00:43.63508]: Started.
#> Warning: replacing previous import 'utils::download.file' by
#> 'restfulr::download.file' when loading 'rtracklayer'
#> ℹ [2024-01-29 18:00:56.710575]: We would assume you marked all variants' position in + strand.
#> 
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:utils':
#> 
#>     findMatches
#> The following objects are masked from 'package:base':
#> 
#>     expand.grid, I, unname
#> 
#> Attaching package: 'Biostrings'
#> The following object is masked from 'package:base':
#> 
#>     strsplit
#> ✔ [2024-01-29 18:00:56.928274]: Reference genome loaded.
#> ✔ [2024-01-29 18:00:56.933858]: Variants from MAF object queried.
#> ✔ [2024-01-29 18:00:56.936058]: Chromosome names checked.
#> ✔ [2024-01-29 18:00:56.93863]: Sex chromosomes properly handled.
#> ✔ [2024-01-29 18:00:56.940699]: Only variants located in standard chromosomes (1:22, X, Y, M/MT) are kept.
#> ✔ [2024-01-29 18:00:56.943268]: Variant start and end position checked.
#> ✔ [2024-01-29 18:00:56.962186]: Variant data for matrix generation preprocessed.
#> ℹ [2024-01-29 18:00:56.96556]: SBS matrix generation - start.
#> ℹ [2024-01-29 18:00:56.968903]: Extracting 5' and 3' adjacent bases.
#> ℹ [2024-01-29 18:00:58.007164]: Extracting +/- 20bp around mutated bases for background C>T estimation.
#> ℹ [2024-01-29 18:00:58.763103]: Estimating APOBEC enrichment scores.
#> ℹ [2024-01-29 18:00:58.765879]: Performing one-way Fisher's test for APOBEC enrichment.
#> ✔ [2024-01-29 18:00:58.773267]: APOBEC related mutations are enriched in 0% of samples (APOBEC enrichment score > 2; 0 of 1 samples)
#> ℹ [2024-01-29 18:00:58.777368]: Creating SBS sample-by-component matrices.
#> ✔ [2024-01-29 18:00:58.784157]: SBS-6 matrix created.
#> ✔ [2024-01-29 18:00:58.790463]: SBS-96 matrix created.
#> ✔ [2024-01-29 18:00:59.833966]: SBS-1536 matrix created.
#> ℹ [2024-01-29 18:00:59.836506]: Return SBS-96 as major matrix.
#> ✔ [2024-01-29 18:00:59.842747]: Done.
#> ℹ [2024-01-29 18:00:59.844982]: 16.211 secs elapsed.
maf_acc_2sample_tally <- sig_tally(object = maf_acc_2sample, mode = c("SBS", "DBS", "ID", "ALL"))
#> ℹ [2024-01-29 18:00:59.848967]: Started.
#> ℹ [2024-01-29 18:00:59.860942]: We would assume you marked all variants' position in + strand.
#> ✔ [2024-01-29 18:00:59.863247]: Reference genome loaded.
#> ✔ [2024-01-29 18:00:59.867371]: Variants from MAF object queried.
#> ✔ [2024-01-29 18:00:59.869408]: Chromosome names checked.
#> ✔ [2024-01-29 18:00:59.871523]: Sex chromosomes properly handled.
#> ✔ [2024-01-29 18:00:59.873352]: Only variants located in standard chromosomes (1:22, X, Y, M/MT) are kept.
#> ✔ [2024-01-29 18:00:59.875137]: Variant start and end position checked.
#> ✔ [2024-01-29 18:00:59.880743]: Variant data for matrix generation preprocessed.
#> ℹ [2024-01-29 18:00:59.882222]: SBS matrix generation - start.
#> ℹ [2024-01-29 18:00:59.884088]: Extracting 5' and 3' adjacent bases.
#> ℹ [2024-01-29 18:01:00.701361]: Extracting +/- 20bp around mutated bases for background C>T estimation.
#> ℹ [2024-01-29 18:01:01.320896]: Estimating APOBEC enrichment scores.
#> ℹ [2024-01-29 18:01:01.327482]: Performing one-way Fisher's test for APOBEC enrichment.
#> ✔ [2024-01-29 18:01:01.340364]: APOBEC related mutations are enriched in 0% of samples (APOBEC enrichment score > 2; 0 of 2 samples)
#> ℹ [2024-01-29 18:01:01.350527]: Creating SBS sample-by-component matrices.
#> ✔ [2024-01-29 18:01:01.371765]: SBS-6 matrix created.
#> ✔ [2024-01-29 18:01:01.392084]: SBS-96 matrix created.
#> ✔ [2024-01-29 18:01:01.463438]: SBS-1536 matrix created.
#> ℹ [2024-01-29 18:01:01.474841]: Return SBS-96 as major matrix.
#> ✔ [2024-01-29 18:01:01.492928]: Done.
#> ℹ [2024-01-29 18:01:01.503824]: 1.655 secs elapsed.

# Print Matrices
maf_acc_1sample_tally$all_matrices$SBS_96
#> A[C>A]A A[C>A]C A[C>A]G A[C>A]T A[C>G]A A[C>G]C A[C>G]G A[C>G]T A[C>T]A A[C>T]C 
#>       0       0       0       0       1       1       0       1       0       1 
#> A[C>T]G A[C>T]T A[T>A]A A[T>A]C A[T>A]G A[T>A]T A[T>C]A A[T>C]C A[T>C]G A[T>C]T 
#>       6       1       0       0       0       0       0       0       0       0 
#> A[T>G]A A[T>G]C A[T>G]G A[T>G]T C[C>A]A C[C>A]C C[C>A]G C[C>A]T C[C>G]A C[C>G]C 
#>       0       0       0       0       0       0       0       0       0       0 
#> C[C>G]G C[C>G]T C[C>T]A C[C>T]C C[C>T]G C[C>T]T C[T>A]A C[T>A]C C[T>A]G C[T>A]T 
#>       0       0       1       0       1       1       0       0       1       0 
#> C[T>C]A C[T>C]C C[T>C]G C[T>C]T C[T>G]A C[T>G]C C[T>G]G C[T>G]T G[C>A]A G[C>A]C 
#>       0       0       0       0       0       1       1       0       0       0 
#> G[C>A]G G[C>A]T G[C>G]A G[C>G]C G[C>G]G G[C>G]T G[C>T]A G[C>T]C G[C>T]G G[C>T]T 
#>       0       0       0       0       0       0       3       4       1       1 
#> G[T>A]A G[T>A]C G[T>A]G G[T>A]T G[T>C]A G[T>C]C G[T>C]G G[T>C]T G[T>G]A G[T>G]C 
#>       0       0       0       0       0       0       1       0       0       0 
#> G[T>G]G G[T>G]T T[C>A]A T[C>A]C T[C>A]G T[C>A]T T[C>G]A T[C>G]C T[C>G]G T[C>G]T 
#>       0       0       1       0       0       0       0       1       0       1 
#> T[C>T]A T[C>T]C T[C>T]G T[C>T]T T[T>A]A T[T>A]C T[T>A]G T[T>A]T T[T>C]A T[T>C]C 
#>       0       1       4       0       0       0       0       0       0       0 
#> T[T>C]G T[T>C]T T[T>G]A T[T>G]C T[T>G]G T[T>G]T 
#>       2       0       1       1       0       0
maf_acc_2sample_tally$all_matrices$SBS_96
#>                              A[C>A]A A[C>A]C A[C>A]G A[C>A]T A[C>G]A A[C>G]C
#> TCGA-OR-A5J1-01A-11D-A29I-10       0       0       0       0       1       1
#> TCGA-OR-A5J2-01A-11D-A29I-10       1       1       0       0       0       0
#>                              A[C>G]G A[C>G]T A[C>T]A A[C>T]C A[C>T]G A[C>T]T
#> TCGA-OR-A5J1-01A-11D-A29I-10       0       1       0       1       6       1
#> TCGA-OR-A5J2-01A-11D-A29I-10       0       0       1       1       5       0
#>                              A[T>A]A A[T>A]C A[T>A]G A[T>A]T A[T>C]A A[T>C]C
#> TCGA-OR-A5J1-01A-11D-A29I-10       0       0       0       0       0       0
#> TCGA-OR-A5J2-01A-11D-A29I-10       1       0       0       0       1       0
#>                              A[T>C]G A[T>C]T A[T>G]A A[T>G]C A[T>G]G A[T>G]T
#> TCGA-OR-A5J1-01A-11D-A29I-10       0       0       0       0       0       0
#> TCGA-OR-A5J2-01A-11D-A29I-10       1       0       0       0       0       0
#>                              C[C>A]A C[C>A]C C[C>A]G C[C>A]T C[C>G]A C[C>G]C
#> TCGA-OR-A5J1-01A-11D-A29I-10       0       0       0       0       0       0
#> TCGA-OR-A5J2-01A-11D-A29I-10       0       2       0       1       0       0
#>                              C[C>G]G C[C>G]T C[C>T]A C[C>T]C C[C>T]G C[C>T]T
#> TCGA-OR-A5J1-01A-11D-A29I-10       0       0       1       0       1       1
#> TCGA-OR-A5J2-01A-11D-A29I-10       0       0       2       0       2       0
#>                              C[T>A]A C[T>A]C C[T>A]G C[T>A]T C[T>C]A C[T>C]C
#> TCGA-OR-A5J1-01A-11D-A29I-10       0       0       1       0       0       0
#> TCGA-OR-A5J2-01A-11D-A29I-10       0       1       1       0       0       0
#>                              C[T>C]G C[T>C]T C[T>G]A C[T>G]C C[T>G]G C[T>G]T
#> TCGA-OR-A5J1-01A-11D-A29I-10       0       0       0       1       1       0
#> TCGA-OR-A5J2-01A-11D-A29I-10       0       0       0       0       0       0
#>                              G[C>A]A G[C>A]C G[C>A]G G[C>A]T G[C>G]A G[C>G]C
#> TCGA-OR-A5J1-01A-11D-A29I-10       0       0       0       0       0       0
#> TCGA-OR-A5J2-01A-11D-A29I-10       0       0       0       0       0       1
#>                              G[C>G]G G[C>G]T G[C>T]A G[C>T]C G[C>T]G G[C>T]T
#> TCGA-OR-A5J1-01A-11D-A29I-10       0       0       3       4       1       1
#> TCGA-OR-A5J2-01A-11D-A29I-10       0       0       1       1       2       0
#>                              G[T>A]A G[T>A]C G[T>A]G G[T>A]T G[T>C]A G[T>C]C
#> TCGA-OR-A5J1-01A-11D-A29I-10       0       0       0       0       0       0
#> TCGA-OR-A5J2-01A-11D-A29I-10       0       1       0       0       0       0
#>                              G[T>C]G G[T>C]T G[T>G]A G[T>G]C G[T>G]G G[T>G]T
#> TCGA-OR-A5J1-01A-11D-A29I-10       1       0       0       0       0       0
#> TCGA-OR-A5J2-01A-11D-A29I-10       0       0       0       0       0       0
#>                              T[C>A]A T[C>A]C T[C>A]G T[C>A]T T[C>G]A T[C>G]C
#> TCGA-OR-A5J1-01A-11D-A29I-10       1       0       0       0       0       1
#> TCGA-OR-A5J2-01A-11D-A29I-10       3       1       0       1       0       0
#>                              T[C>G]G T[C>G]T T[C>T]A T[C>T]C T[C>T]G T[C>T]T
#> TCGA-OR-A5J1-01A-11D-A29I-10       0       1       0       1       4       0
#> TCGA-OR-A5J2-01A-11D-A29I-10       0       1       2       2       2       0
#>                              T[T>A]A T[T>A]C T[T>A]G T[T>A]T T[T>C]A T[T>C]C
#> TCGA-OR-A5J1-01A-11D-A29I-10       0       0       0       0       0       0
#> TCGA-OR-A5J2-01A-11D-A29I-10       0       0       0       0       0       0
#>                              T[T>C]G T[T>C]T T[T>G]A T[T>G]C T[T>G]G T[T>G]T
#> TCGA-OR-A5J1-01A-11D-A29I-10       2       0       1       1       0       0
#> TCGA-OR-A5J2-01A-11D-A29I-10       0       0       1       0       0       1

# Note that the two returned objects differ in class
maf_acc_1sample_tally$all_matrices$SBS_96 |> class()
#> [1] "integer"
maf_acc_2sample_tally$all_matrices$SBS_96 |> class()
#> [1] "matrix" "array"

# Only multi-sample analyses have rownames based on sample name
maf_acc_1sample_tally$all_matrices$SBS_96 |> rownames()
#> NULL
maf_acc_2sample_tally$all_matrices$SBS_96 |> rownames()
#> [1] "TCGA-OR-A5J1-01A-11D-A29I-10" "TCGA-OR-A5J2-01A-11D-A29I-10"

# SessionInfo
sessionInfo()
#> R version 4.3.2 (2023-10-31)
#> Platform: x86_64-apple-darwin20 (64-bit)
#> Running under: macOS Ventura 13.5.2
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: Australia/Sydney
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] BSgenome.Hsapiens.UCSC.hg19_1.4.3 BSgenome_1.64.0                  
#>  [3] rtracklayer_1.56.0                Biostrings_2.64.1                
#>  [5] XVector_0.36.0                    GenomicRanges_1.48.0             
#>  [7] GenomeInfoDb_1.32.4               IRanges_2.30.1                   
#>  [9] S4Vectors_0.34.0                  maftools_2.12.0                  
#> [11] sigminer_2.2.0                    Biobase_2.56.0                   
#> [13] BiocGenerics_0.42.0              
#> 
#> loaded via a namespace (and not attached):
#>  [1] tidyselect_1.2.0            gridBase_0.4-7             
#>  [3] dplyr_1.1.3                 R.utils_2.11.0             
#>  [5] bitops_1.0-7                fastmap_1.1.1              
#>  [7] RCurl_1.98-1.10             GenomicAlignments_1.32.1   
#>  [9] XML_3.99-0.13               reprex_2.0.1               
#> [11] digest_0.6.33               lifecycle_1.0.3            
#> [13] cluster_2.1.4               survival_3.5-0             
#> [15] magrittr_2.0.3              compiler_4.3.2             
#> [17] rlang_1.1.3                 rngtools_1.5.2             
#> [19] tools_4.3.2                 utf8_1.2.3                 
#> [21] yaml_2.3.7                  data.table_1.14.8          
#> [23] NMF_0.25                    knitr_1.44                 
#> [25] curl_5.1.0                  DelayedArray_0.22.0        
#> [27] plyr_1.8.7                  RColorBrewer_1.1-3         
#> [29] BiocParallel_1.30.4         registry_0.5-1             
#> [31] R.cache_0.15.0              withr_2.5.1                
#> [33] purrr_1.0.2                 R.oo_1.25.0                
#> [35] grid_4.3.2                  fansi_1.0.5                
#> [37] colorspace_2.1-0            future_1.30.0              
#> [39] ggplot2_3.4.4               globals_0.16.2             
#> [41] scales_1.2.1                iterators_1.0.14           
#> [43] SummarizedExperiment_1.26.1 cli_3.6.2                  
#> [45] crayon_1.5.2                rmarkdown_2.25             
#> [47] generics_0.1.3              rstudioapi_0.13            
#> [49] rjson_0.2.21                reshape2_1.4.4             
#> [51] DBI_1.1.3                   DNAcopy_1.70.0             
#> [53] stringr_1.5.0               zlibbioc_1.42.0            
#> [55] splines_4.3.2               parallel_4.3.2             
#> [57] restfulr_0.0.14             BiocManager_1.30.21        
#> [59] matrixStats_0.63.0          vctrs_0.6.3                
#> [61] Matrix_1.5-3                listenv_0.9.0              
#> [63] foreach_1.5.2               glue_1.7.0                 
#> [65] parallelly_1.32.1           codetools_0.2-19           
#> [67] stringi_1.7.12              gtable_0.3.4               
#> [69] BiocIO_1.6.0                munsell_0.5.0              
#> [71] tibble_3.2.1                styler_1.7.0               
#> [73] furrr_0.3.1                 pillar_1.9.0               
#> [75] htmltools_0.5.4             GenomeInfoDbData_1.2.8     
#> [77] R6_2.5.1                    doParallel_1.0.17          
#> [79] evaluate_0.22               lattice_0.21-9             
#> [81] Rsamtools_2.12.0            R.methodsS3_1.8.2          
#> [83] Rcpp_1.0.11                 xfun_0.40                  
#> [85] MatrixGenerics_1.8.1        fs_1.6.3                   
#> [87] pkgconfig_2.0.3

Created on 2024-01-29 by the reprex package (v2.0.1)

ShixiangWang commented 7 months ago

Hi @selkamand

Thank you for this important feedback. Could you install the latest version of sigminer, as the issue should be addressed in the v2.2.1 (based on previous report at https://github.com/ShixiangWang/sigminer/discussions/432).

This update is recorded in https://github.com/ShixiangWang/sigminer/blob/master/NEWS.md#sigminer-221.

Best, Shixiang

selkamand commented 7 months ago

Thanks @ShixiangWang. Working great. Apologies for the duplicate issue!

ShixiangWang commented 7 months ago

You're welcome.