Closed mishugeb closed 5 years ago
There is not golden standard for choosing best number. Maftools uses a common measure to select the best rank. You may try sigminer package, its current version is compatible with maftools. https://shixiangwang.github.io/sigminer/articles/sigminer.html
@mishugeb For understanding how to select signature number, please read section 2.6 of NMF vignette, or the NMF paper Renaud Gaujoux, Cathal Seoighe (2010). A flexible R package for nonnegative matrix factorization. BMC Bioinformatics 2010, 11:367. [http://www.biomedcentral.com/1471-2105/11/367]
Thank you for your reply. I have installed sigminer and going to use that one.
Hello, You should consult the elbow plot generated during the signature extraction to judge the optimal number of signatures. The way maftools chooses the number of signatures is bit unsophisticated and its possible that the estimated number of signature might not be the optimal solution.
Just like k-means clustering, NMF too is an unsupervised method. Hence the number of signatures need to be carefully chosen. The elbow plot
generated while extracting signatures helps to decide this number. The way to interpret the plot is to look at the point at which the cophenetic metric
on y-axis reaches the maximum and drops afterwards with no significant change as the number of signatures increase (x-axis). In the example below plot it could be around 6-7.
Now in your case I think the plot might have looked something like this.
Here, I would guess the optimal number of signatures is 6 since the cophenetic metric drops sharply at 6 and there is minimal change afterwards. However, maftools
chose 3 as the number of signatures since its the first point at which the value dropped.
I hope this was helpful to understand the results. My suggestion is to decide the number of signatures based on the elbow plot - and rerun the extractsignatures
function with n
argument. Let me know if you still have any questions.
Hi, From the elbow plot below, would you use n=5 to extract signatures?
Cophenetic_Example.pdf Thanks!
It seems 3 is already good. Could you may be run it till 10 and check if the values drop after 6?
You should check for cophenetic cor. value. Run estimateSignatures
for 10 signatures.
sigest = estimateSignatures(mat = laml.tnm, nTry = 10)
plotCophenetic(res = sigest)
Above figure should help you to decide number of signatures. Number of signatures depend on the mutation load, and sample size. For small cohort you would not expect lots of signatures.
Hi, Yes - I realised later that I can use nTry in order to try more signatures. In this case, the values do not drop after 6 so I guess 3-5 signatures will be the best in this case. Would you recommend any other tool to validate these results? Many thanks!
Try sigminer (https://shixiangwang.github.io/sigminer-doc/, especially you can run the sig_fit) or sigprofiler (https://osf.io/t6j7u/wiki/home/) (sigprofiler is a gold standard tool).
Hi, I have installed maftools from Bioconductor. When I am using the extractSignatures function with the following parameters on different datasets where are should be more than 10 signatures: (mat = input, n =NULL, nTry = 30, plotBestFitRes = TRUE)
I am always ending up with only 3-4 as the best-fit rank numbers. Am I making any mistake with the parameters? Or do I need to manually select the number of signatures? If so, what are the best criteria to select the optimum number of signatures?
Thanks.