Closed beginner984 closed 4 years ago
Hello, As far as I know, it is not possible to calculate OR or P-value for cosine similarity. Feel free to correct me if I am wrong or point me to a reference, I am more than happy to implement it.
Thank you so much
Actually I showed my output to the writer of this paper
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5957269/
And she pointed this to me as a co-author for the data we submitted to ISDE congress
However in this website they are likely using p-value to report the best solution
https://signal.mutationalsignatures.com/analyse/jobs/Example/sample/Sample
They are saying
The algorithm that was used to determine the signatures contributing to your sample generates multiple solutions via bootstrapping. The result given in the table and chart above (if present) corresponds to the median contribution count for each signature's distribution of potential counts across each bootstrap, or 0 for signatures whose count falls below the sparsity filter threshold (dashed green line) for a significant fraction (p-value) of the solutions.
I am not really aware of the tool you are referring and I cannot comment on its output. Regardless, I would also suggest you to try tools like deconstructSigs if your goal is to get signature proportion per sample.
The functionality provided by maftools focus on signatures identified from a group of samples. If you want to check if a signature exists in a sample, as @PoisonAlien suggested, you should find other tools, like the new paper used the bootstrapping.
Thank you both but maftools also provides signature proportion per sample as in this picture
This is signature per cluster.
@beginner984 If you want to get signature exposure instead of reporting a p-value, you can try sigminer I am developing. It is similar to usage in maftools because I used maftools as a backend for SBS signature. get_sig_exposure()
will get the exposures.
Try the following example.
devtools::install_github("ShixiangWang/sigminer@V3")
library(sigminer)
library(NMF)
laml.maf <- system.file("extdata", "tcga_laml.maf.gz", package = "maftools")
laml <- read_maf(maf = laml.maf)
if (require("BSgenome.Hsapiens.UCSC.hg19")) {
mt_prepare <- sig_tally(
laml,
ref_genome = "BSgenome.Hsapiens.UCSC.hg19",
prefix = "chr", add = TRUE, useSyn = TRUE
)
} else {
message("Please install package 'BSgenome.Hsapiens.UCSC.hg19' firstly!")
}
sig = sig_extract(mt_prepare$nmf_matrix, n_sig = 3, pConstant = 1e-13)
get_sig_exposure(sig)
> get_sig_exposure(sig)
sample Sig1 Sig2 Sig3
1: TCGA-AB-2802 5.951574e+00 2.106827e-12 4.048419e+00
2: TCGA-AB-2803 6.874475e+00 7.125541e+00 9.300760e-14
3: TCGA-AB-2804 8.830048e-13 1.593764e+00 3.406233e+00
4: TCGA-AB-2805 2.457190e-13 6.964313e+00 7.035688e+00
5: TCGA-AB-2806 1.764339e-12 3.483714e-12 1.299997e+01
---
184: TCGA-AB-3007 1.014093e-12 4.575247e+00 2.424758e+00
185: TCGA-AB-3008 1.294223e-11 4.000009e+00 3.528735e-12
186: TCGA-AB-3009 8.818428e+00 7.079667e+00 1.910188e+01
187: TCGA-AB-3011 5.000001e+00 4.441747e-11 1.666433e-11
188: TCGA-AB-3012 1.385718e-13 2.810821e+00 3.189179e+00
More run ?get_sig_exposure
.
Hope this helps.
Thank you
I think it's great
Can we make a stack plot of that to show exposure proportion?
@beginner984 Use show_sig_exposure(sig)
If you have any problem, you can open an issue at https://github.com/ShixiangWang/sigminer/issues, hope this helps you close the issue here.
Thank you @ShixiangWang
Where I can find a tutorial on your package?
I found this
https://shixiangwang.github.io/sigminer-doc/references.html
But that is empty
I mean something to use all of the capability of your package like the same maftools offers
I can find individual R codes in your git but I can not follow that well
Thank you
Hello @beginner984 , Here is how those barplots are drawn just to clarify certain things.
extractSignatures
. In your case there are to 3 signatures.maftools::plotSignatures(contributions = TRUE)
This is a crude method of assigning samples to signatures. Other methods such as deconstructSigs
(and probably @ShixiangWang 's sigminer) does it in much better way.
Hope this is clear.
@beginner984 See https://shixiangwang.github.io/sigminer/reference/index.html for now, this tool is still under development, and I am writing the doc. sigminer focuses on the signature analysis, currently, I directly use maftools code to generate SNV matrix for NMF. Thank you for your previous work, @PoisonAlien :), I hope this package is worth to be recommended in maftools in the future 😄
@PoisonAlien and @ShixiangWang, hello
I know in maftools implemented files and the other way like
sp_url <- paste("https://cancer.sanger.ac.uk/cancergenome/assets/",
+ "signatures_probabilities.txt", sep = "")
implemented in MutationalPattern
R package
I can download the probabilities of 30 mutational signatures from the COSMIC website but version 2 (30 signatures)
Anyone knows where I can obtain these probabilities for version 3 (60 signatures)
Thanks in advance
Use the code to get them from Maftools
sigs_db <- readRDS(file = system.file("extdata", "SBS_signatures.RDs",
package = "maftools", mustWork = TRUE
))
Thank you @ShixiangWang
How I convert sigs_db to a matrix like this in COSMIC
> head(cancer_signatures)
Signature.1 Signature.2 Signature.3 Signature.4 Signature.5 Signature.6
A[C>A]A 0.011098326 6.827082e-04 0.02217231 0.0365 0.014941548 0.0017
A[C>A]C 0.009149341 6.191072e-04 0.01787168 0.0309 0.008960918 0.0028
A[C>A]G 0.001490070 9.927896e-05 0.00213834 0.0183 0.002207846 0.0005
A[C>A]T 0.006233885 3.238914e-04 0.01626515 0.0243 0.009206905 0.0019
C[C>A]A 0.006595870 6.774450e-04 0.01878173 0.0461 0.009674904 0.0101
C[C>A]C 0.007342368 2.136810e-04 0.01576046 0.0614 0.004952301 0.0241
Hello @beginner984 These trivial things must be resolved by your own. Please restrict your questions to actual issues associated with the package.
@PoisonAlien I totally agree with you. @beginner984 If you checked the result, you should know that you can obtain what you want very easy, just call sigs_db$db
.
Hi
I have obtained mutational signature from my samples like below
You are please seeing, signature 1 fits to COSMIC_2 with 0.89 similarity but you also see only one sample is underlying getting COSMIC_2 signature
I think it's Ok to report Cos similarity, but less important than the odds ratio of enrichment or the p-value
My question is, is it possible to report p-value or odd ratio for this similarity?
Thanks a lot for any information