Closed songeric1107 closed 1 year ago
A signature number around 25 is a good point as the result converges. You can try some other approaches in the sigminer to explore the proper signature number.
thank you for your quick response, why 25 is a good point? based on silhouette consense score? or other index. how to use those index to determine the best representative? Why not 11? I see a drop after 11 based on cophenetic score? Will more signatures overfit the results? Thank you
Yeah, 11 is Ok. Based on the NMF survey plot, you can determine a proper signature number with your observations on some specific measure like cophenetic or silhoutee. However, you cannot automatically determine the signature number.
Try using sig_auto_extract
with bayesian NMF or using bootstrapped NMF to get a more robust estimation of the matrix decomposition (https://shixiangwang.github.io/sigminer/articles/sigminer.html).
In general, we want the signatures we obtained are different from each other while keep less reconstruction error from the matrix decomposition.
I did try the bayesian NMF methods, the suggested signature is 4 (proj2$suggested), I am not sure how that number is selected. if you check the consense plot, 4 signatures do not make sense to me. signature.syn.consense.pdf
Sorry, I also have another question. If I have datasets from two groups, should I combine datasets together for mutation signature analysis? I could check the contributions difference between groups for any signature being identified. Or should I do signature selection separately? thanks
signature.syn.consense.pdf
Have you tried with a large initial signature in sig_auto_extract()
, the default value is 25, which may not fit your data, you can take sample number - 1 to try.
Sorry, I also have another question. If I have datasets from two groups, should I combine datasets together for mutation signature analysis? I could check the contributions difference between groups for any signature being identified. Or should I do signature selection separately? thanks
For comparison purpose, combine the data is more recommended.
thank you. I tried to use the a large initial signature =samplenumer-1, only 1 signature is returned.
mt_sig2 <- sig_auto_extract(signature.syn, K0 = 56, nrun = 30, strategy = "stable",cores=2)
Progress: ──────────────────────────────────── 100%Select Run 5, which K = 1 as the best solution.
That's truly strange. Could you show a subset of your data, like signature.syn[1:5, 1:5]
.
Also could you try
e1 <- bp_extract_signatures(signature.syn, range = 5:30)
bp_show_survey2(e1)
signature.sy n[10:25, 1:5] Â A[C>A]A A[C>A]C A[C>A]G A[C>A]T C[C>A]A s1 0 0 0 0 0 s2 0 0 0 0 0 s3 0 0 0 0 0 s4 0 0 0 0 0 s5 0 0 0 0 1 s6 0 0 0 0 0 s7 0 0 0 0 0 s8 0 0 0 0 0 s9 0 0 0 0 0 s10 0 0 0 0 0 s11 0 0 0 0 0 s12 0 0 0 0 0 s13 0 0 0 0 0 s14 0 0 0 0 0 s15 0 0 0 0 0 s16 0 0 0 0 0
Sorry for another question. if I got two dropped points, which one should I pick? sig.all (dragged).pdf
Meanwhile, if I would like to compare the signature difference between two groups, is the fisher test appropriate for comparing the exposure count between groups? thanks
Sorry for another question. if I got two dropped points, which one should I pick? sig.all (dragged).pdf
Based on the plot, you can try 7. And analyze if the obtained 7 mutational signatures could be well mapped to COSMIC reference signatures.
Meanwhile, if I would like to compare the signature difference between two groups, is the fisher test appropriate for comparing the exposure count between groups? thanks
If you categorize the signature exposure to a binary variable, use fisher test is good.
If you compare directly, just use wilcox.test.
"Based on the plot, you can try 7. And analyze if the obtained 7 mutational signatures could be well mapped to COSMIC reference signatures."
--may I ask why chose 7, not 2?
--what exactly mean by "well mapped", similarity score comparing to the SBS reference?
For point 1, 7 keeps a high silhouette (stability) while a low reconstruction error.
For point 2, yes, in general, cosine similarity > 0.8 could be considered as well mapped.
I used this function to extract the mutation signatures based on MAF files.
e2 <- bp_extract_signatures(signature.syn, n_nmf_run = 50)
syn_estimate.55 copy.pdf
I feel like there is not a good condense signature for this dataset. what's your option?