ShixiangWang / sigminer

🌲 An easy-to-use and scalable toolkit for genomic alteration signature (a.k.a. mutational signature) analysis and visualization in R https://shixiangwang.github.io/sigminer/reference/index.html
https://shixiangwang.github.io/sigminer/
Other
141 stars 18 forks source link

About relative contribution and mutation counts in samples #462

Closed ShepherdZh closed 1 month ago

ShepherdZh commented 2 months ago

Hi, Shixiang Thanks for developing this great tool! I have 2 questions.

1) I used maftools to merge a couple of maf files and loaded it to sigminer, and followed by the workflow as below:

--------------------------

laml=readRDS('merged_for_sigminer.rds') mt_tally <- sig_tally( laml, ref_genome = "BSgenome.Hsapiens.UCSC.hg38", useSyn = TRUE, mode = "ALL" ) sig_result_sbs96=sig_auto_extract(mt_tally$SBS_96) show_sig_profile(sig_result_sbs96,mode = c('SBS'),style = 'cosmic')

-------------------

This workflow went just fine, I'd like to know, whether the 'contribution' in output of function show_sig_profile() is comparable between samples?(does NMF-vector as contribution mean relative contribution?) Since I have a few samples and need to compare the contribution between them, but the result seemingly lacks any sample information. (Also, I tried the other package SigProfilerAssignment. The result of which use 'Percentage of Single Base Substitutions' as y axis and also have no sample information nor 'relative'.) If the contribution is comparable between samples, would it be rational to iterate the above workflow among samples and collect those contributions?

2) I am interested in the shift of platinum treatment and APOBEC between samples, this seems to be aetiology in the COSMIC thing, how can I connect platinum treatment with, say, SBS96? is it through similarity? and how can i count the corresponding SBS96 mutation in each samples?

Best regareds! Yiming

ShixiangWang commented 2 months ago

After you extracted the signatures, you can access "Exposure" or "Exposure.norm" to get the absolute or relative contributions. They all comparable between samples and signatures. You can iterate the above workflow among samples and collect those contributions as you want.

> load(system.file("extdata", "toy_mutational_signature.RData",
+                  package = "sigminer", mustWork = TRUE
+ ))
> names(sig2)
[1] "Signature"      "Signature.norm" "Exposure"       "Exposure.norm" 
[5] "K"              "Raw"           
> sig2$Exposure[, 1:5]
     TCGA-AB-2802 TCGA-AB-2803 TCGA-AB-2804 TCGA-AB-2805 TCGA-AB-2806
Sig1     0.000000    0.0000000       0.0000    0.2000784    0.2923331
Sig2     7.602352   12.0079356       4.8389   12.0079356   10.2737866
Sig3     1.256607    0.7412961       0.0000    0.0000000    0.0000000
> sig2$Exposure.norm[, 1:5]
     TCGA-AB-2802 TCGA-AB-2803 TCGA-AB-2804 TCGA-AB-2805 TCGA-AB-2806
Sig1    0.0000000   0.00000000            0    0.0163891   0.02766703
Sig2    0.8581541   0.94185563            1    0.9836109   0.97233297
Sig3    0.1418459   0.05814437            0    0.0000000   0.00000000

For the 2nd question, you can refer to sig_fit function

data("simulated_catalogs")
data = simulated_catalogs$set1
data[1:5, 1:5]

# Fitting with all COSMIC v2 reference signatures, you can specify the reference signature list as you want 
sig_fit(data, sig_index = "ALL")
# Check ?sig_fit for sig_db options
# e.g., use the COSMIC SBS v3
sig_fit(data, sig_index = "ALL", sig_db = "SBS")

Here is just for illustration, commonly input multiple signature index for a reference signature list, using just one is not recommended.

> sig_fit(data, sig_index = "SBS96", sig_db = "latest_SBS_GRCh37")
ℹ [2024-07-12 16:17:56.103906]: Started.
✔ [2024-07-12 16:17:56.128988]: Signature index detected.
ℹ [2024-07-12 16:17:56.144772]: Checking signature database in package.
ℹ [2024-07-12 16:17:56.164345]: Checking signature index.
ℹ [2024-07-12 16:17:56.180164]: Valid index for db 'latest_SBS_GRCh37':
SBS1 SBS2 SBS3 SBS4 SBS5 SBS6 SBS7a SBS7b SBS7c SBS7d SBS8 SBS9 SBS10a SBS10b SBS10c SBS10d SBS11 SBS12 SBS13 SBS14 SBS15 SBS16 SBS17a SBS17b SBS18 SBS19 SBS20 SBS21 SBS22a SBS22b SBS23 SBS24 SBS25 SBS26 SBS27 SBS28 SBS29 SBS30 SBS31 SBS32 SBS33 SBS34 SBS35 SBS36 SBS37 SBS38 SBS39 SBS40a SBS40b SBS40c SBS41 SBS42 SBS43 SBS44 SBS45 SBS46 SBS47 SBS48 SBS49 SBS50 SBS51 SBS52 SBS53 SBS54 SBS55 SBS56 SBS57 SBS58 SBS59 SBS60 SBS84 SBS85 SBS86 SBS87 SBS88 SBS89 SBS90 SBS91 SBS92 SBS93 SBS94 SBS95 SBS96 SBS97 SBS98 SBS99
✔ [2024-07-12 16:17:56.197087]: Database and index checked.
✔ [2024-07-12 16:17:56.212668]: Signature normalized.
ℹ [2024-07-12 16:17:56.231608]: Checking row number for catalog matrix and signature matrix.
✔ [2024-07-12 16:17:56.249573]: Checked.
ℹ [2024-07-12 16:17:56.264447]: Checking rownames for catalog matrix and signature matrix.
ℹ [2024-07-12 16:17:56.279135]: Matrix V and W don't have same orders. Try reordering...
✔ [2024-07-12 16:17:56.296638]: Checked.
✔ [2024-07-12 16:17:56.312938]: Method 'QP' detected.
✔ [2024-07-12 16:17:56.341225]: Corresponding function generated.
ℹ [2024-07-12 16:17:56.368862]: Calling function.
ℹ [2024-07-12 16:17:56.390417]: Fitting sample: Sample_1
ℹ [2024-07-12 16:17:56.40823]: Fitting sample: Sample_2
ℹ [2024-07-12 16:17:56.426255]: Fitting sample: Sample_3
ℹ [2024-07-12 16:17:56.441939]: Fitting sample: Sample_4
ℹ [2024-07-12 16:17:56.461879]: Fitting sample: Sample_5
ℹ [2024-07-12 16:17:56.482981]: Fitting sample: Sample_6
ℹ [2024-07-12 16:17:56.499237]: Fitting sample: Sample_7
ℹ [2024-07-12 16:17:56.513535]: Fitting sample: Sample_8
ℹ [2024-07-12 16:17:56.528075]: Fitting sample: Sample_9
ℹ [2024-07-12 16:17:56.544612]: Fitting sample: Sample_10
ℹ [2024-07-12 16:17:56.560418]: Fitting sample: Sample_11
ℹ [2024-07-12 16:17:56.575142]: Fitting sample: Sample_12
ℹ [2024-07-12 16:17:56.590674]: Fitting sample: Sample_13
ℹ [2024-07-12 16:17:56.606524]: Fitting sample: Sample_14
ℹ [2024-07-12 16:17:56.622544]: Fitting sample: Sample_15
ℹ [2024-07-12 16:17:56.638827]: Fitting sample: Sample_16
ℹ [2024-07-12 16:17:56.654355]: Fitting sample: Sample_17
ℹ [2024-07-12 16:17:56.668734]: Fitting sample: Sample_18
ℹ [2024-07-12 16:17:56.682304]: Fitting sample: Sample_19
ℹ [2024-07-12 16:17:56.696051]: Fitting sample: Sample_20
ℹ [2024-07-12 16:17:56.709503]: Fitting sample: Sample_21
ℹ [2024-07-12 16:17:56.723163]: Fitting sample: Sample_22
ℹ [2024-07-12 16:17:56.736903]: Fitting sample: Sample_23
ℹ [2024-07-12 16:17:56.750794]: Fitting sample: Sample_24
ℹ [2024-07-12 16:17:56.764507]: Fitting sample: Sample_25
ℹ [2024-07-12 16:17:56.778484]: Fitting sample: Sample_26
ℹ [2024-07-12 16:17:56.800125]: Fitting sample: Sample_27
ℹ [2024-07-12 16:17:56.812998]: Fitting sample: Sample_28
ℹ [2024-07-12 16:17:56.825405]: Fitting sample: Sample_29
ℹ [2024-07-12 16:17:56.839013]: Fitting sample: Sample_30
✔ [2024-07-12 16:17:56.853513]: Done.
ℹ [2024-07-12 16:17:56.867595]: Generating output signature exposures.
✔ [2024-07-12 16:17:56.887018]: Done.
ℹ [2024-07-12 16:17:56.902758]: 0.799 secs elapsed.
      Sample_1 Sample_2 Sample_3 Sample_4 Sample_5 Sample_6 Sample_7
SBS96    42760    18824    10770    10586     2695     1613     1597
      Sample_8 Sample_9 Sample_10 Sample_11 Sample_12 Sample_13 Sample_14
SBS96     2392     8124      9713      5256     49968      2117      4211
      Sample_15 Sample_16 Sample_17 Sample_18 Sample_19 Sample_20
SBS96     10788      7726     27673      5572     32892      2597
      Sample_21 Sample_22 Sample_23 Sample_24 Sample_25 Sample_26
SBS96     13332     38357      4412     45896     10162      5945
      Sample_27 Sample_28 Sample_29 Sample_30
SBS96     10090      2973      3648      2919