PoisonAlien / maftools

Summarize, Analyze and Visualize MAF files from TCGA or in-house studies.
http://bioconductor.org/packages/release/bioc/html/maftools.html
MIT License
445 stars 219 forks source link

Is it possible to report p-value for mutational signature #482

Closed beginner984 closed 4 years ago

beginner984 commented 4 years ago

Hi

I have obtained mutational signature from my samples like below

Rplot09

Rplot10

You are please seeing, signature 1 fits to COSMIC_2 with 0.89 similarity but you also see only one sample is underlying getting COSMIC_2 signature

I think it's Ok to report Cos similarity, but less important than the odds ratio of enrichment or the p-value

My question is, is it possible to report p-value or odd ratio for this similarity?

Thanks a lot for any information

PoisonAlien commented 4 years ago

Hello, As far as I know, it is not possible to calculate OR or P-value for cosine similarity. Feel free to correct me if I am wrong or point me to a reference, I am more than happy to implement it.

beginner984 commented 4 years ago

Thank you so much

Actually I showed my output to the writer of this paper

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5957269/

And she pointed this to me as a co-author for the data we submitted to ISDE congress

However in this website they are likely using p-value to report the best solution

https://signal.mutationalsignatures.com/analyse/jobs/Example/sample/Sample

They are saying

The algorithm that was used to determine the signatures contributing to your sample generates multiple solutions via bootstrapping. The result given in the table and chart above (if present) corresponds to the median contribution count for each signature's distribution of potential counts across each bootstrap, or 0 for signatures whose count falls below the sparsity filter threshold (dashed green line) for a significant fraction (p-value) of the solutions.

PoisonAlien commented 4 years ago

I am not really aware of the tool you are referring and I cannot comment on its output. Regardless, I would also suggest you to try tools like deconstructSigs if your goal is to get signature proportion per sample.

ShixiangWang commented 4 years ago

The functionality provided by maftools focus on signatures identified from a group of samples. If you want to check if a signature exists in a sample, as @PoisonAlien suggested, you should find other tools, like the new paper used the bootstrapping.

beginner984 commented 4 years ago

Thank you both but maftools also provides signature proportion per sample as in this picture Rplot10

ShixiangWang commented 4 years ago

This is signature per cluster.

@beginner984 If you want to get signature exposure instead of reporting a p-value, you can try sigminer I am developing. It is similar to usage in maftools because I used maftools as a backend for SBS signature. get_sig_exposure() will get the exposures.

Try the following example.

devtools::install_github("ShixiangWang/sigminer@V3")

library(sigminer)
library(NMF)

laml.maf <- system.file("extdata", "tcga_laml.maf.gz", package = "maftools")
laml <- read_maf(maf = laml.maf)
if (require("BSgenome.Hsapiens.UCSC.hg19")) {
  mt_prepare <- sig_tally(
    laml,
    ref_genome = "BSgenome.Hsapiens.UCSC.hg19",
    prefix = "chr", add = TRUE, useSyn = TRUE
  )
} else {
  message("Please install package 'BSgenome.Hsapiens.UCSC.hg19' firstly!")
}

sig = sig_extract(mt_prepare$nmf_matrix, n_sig = 3, pConstant = 1e-13)
get_sig_exposure(sig)
> get_sig_exposure(sig)
           sample         Sig1         Sig2         Sig3
  1: TCGA-AB-2802 5.951574e+00 2.106827e-12 4.048419e+00
  2: TCGA-AB-2803 6.874475e+00 7.125541e+00 9.300760e-14
  3: TCGA-AB-2804 8.830048e-13 1.593764e+00 3.406233e+00
  4: TCGA-AB-2805 2.457190e-13 6.964313e+00 7.035688e+00
  5: TCGA-AB-2806 1.764339e-12 3.483714e-12 1.299997e+01
 ---                                                    
184: TCGA-AB-3007 1.014093e-12 4.575247e+00 2.424758e+00
185: TCGA-AB-3008 1.294223e-11 4.000009e+00 3.528735e-12
186: TCGA-AB-3009 8.818428e+00 7.079667e+00 1.910188e+01
187: TCGA-AB-3011 5.000001e+00 4.441747e-11 1.666433e-11
188: TCGA-AB-3012 1.385718e-13 2.810821e+00 3.189179e+00

More run ?get_sig_exposure.

Hope this helps.

beginner984 commented 4 years ago

Thank you

I think it's great

Can we make a stack plot of that to show exposure proportion?

ShixiangWang commented 4 years ago

@beginner984 Use show_sig_exposure(sig)

If you have any problem, you can open an issue at https://github.com/ShixiangWang/sigminer/issues, hope this helps you close the issue here.

beginner984 commented 4 years ago

Thank you @ShixiangWang

Where I can find a tutorial on your package?

I found this

https://shixiangwang.github.io/sigminer-doc/references.html

But that is empty

I mean something to use all of the capability of your package like the same maftools offers

I can find individual R codes in your git but I can not follow that well

Thank you

PoisonAlien commented 4 years ago

Hello @beginner984 , Here is how those barplots are drawn just to clarify certain things.

  1. Estimate de-novo signatures with extractSignatures. In your case there are to 3 signatures.
  2. Above step also gives you proportion of each of these 3 signatures in every sample. You can visualise these with maftools::plotSignatures(contributions = TRUE)
  3. With this matrix of proportions, next step is to divide your cohort into k parts using k-means (here k = 3).
  4. The barplot shows average contribution of all 3 signatures among 3 clusters. It seems that samples in cluster 1 have high load of signature-2, and so on..

This is a crude method of assigning samples to signatures. Other methods such as deconstructSigs (and probably @ShixiangWang 's sigminer) does it in much better way.

Hope this is clear.

ShixiangWang commented 4 years ago

@beginner984 See https://shixiangwang.github.io/sigminer/reference/index.html for now, this tool is still under development, and I am writing the doc. sigminer focuses on the signature analysis, currently, I directly use maftools code to generate SNV matrix for NMF. Thank you for your previous work, @PoisonAlien :), I hope this package is worth to be recommended in maftools in the future 😄

beginner984 commented 4 years ago

@PoisonAlien and @ShixiangWang, hello

I know in maftools implemented files and the other way like

sp_url <- paste("https://cancer.sanger.ac.uk/cancergenome/assets/",
+                 "signatures_probabilities.txt", sep = "")

implemented in MutationalPattern R package

I can download the probabilities of 30 mutational signatures from the COSMIC website but version 2 (30 signatures)

Anyone knows where I can obtain these probabilities for version 3 (60 signatures)

Thanks in advance

ShixiangWang commented 4 years ago

Use the code to get them from Maftools

      sigs_db <- readRDS(file = system.file("extdata", "SBS_signatures.RDs",
        package = "maftools", mustWork = TRUE
      ))
beginner984 commented 4 years ago

Thank you @ShixiangWang

How I convert sigs_db to a matrix like this in COSMIC

> head(cancer_signatures)
        Signature.1  Signature.2 Signature.3 Signature.4 Signature.5 Signature.6
A[C>A]A 0.011098326 6.827082e-04  0.02217231      0.0365 0.014941548      0.0017
A[C>A]C 0.009149341 6.191072e-04  0.01787168      0.0309 0.008960918      0.0028
A[C>A]G 0.001490070 9.927896e-05  0.00213834      0.0183 0.002207846      0.0005
A[C>A]T 0.006233885 3.238914e-04  0.01626515      0.0243 0.009206905      0.0019
C[C>A]A 0.006595870 6.774450e-04  0.01878173      0.0461 0.009674904      0.0101
C[C>A]C 0.007342368 2.136810e-04  0.01576046      0.0614 0.004952301      0.0241
PoisonAlien commented 4 years ago

Hello @beginner984 These trivial things must be resolved by your own. Please restrict your questions to actual issues associated with the package.

ShixiangWang commented 4 years ago

@PoisonAlien I totally agree with you. @beginner984 If you checked the result, you should know that you can obtain what you want very easy, just call sigs_db$db.