PoisonAlien / maftools

Summarize, Analyze and Visualize MAF files from TCGA or in-house studies.
http://bioconductor.org/packages/release/bioc/html/maftools.html
MIT License
452 stars 222 forks source link

Not getting more than 3-4 signatures no matter how large data set I am using #370

Closed mishugeb closed 5 years ago

mishugeb commented 5 years ago

Hi, I have installed maftools from Bioconductor. When I am using the extractSignatures function with the following parameters on different datasets where are should be more than 10 signatures: (mat = input, n =NULL, nTry = 30, plotBestFitRes = TRUE)

I am always ending up with only 3-4 as the best-fit rank numbers. Am I making any mistake with the parameters? Or do I need to manually select the number of signatures? If so, what are the best criteria to select the optimum number of signatures?

Thanks.

ShixiangWang commented 5 years ago

There is not golden standard for choosing best number. Maftools uses a common measure to select the best rank. You may try sigminer package, its current version is compatible with maftools. https://shixiangwang.github.io/sigminer/articles/sigminer.html

ShixiangWang commented 5 years ago

@mishugeb For understanding how to select signature number, please read section 2.6 of NMF vignette, or the NMF paper Renaud Gaujoux, Cathal Seoighe (2010). A flexible R package for nonnegative matrix factorization. BMC Bioinformatics 2010, 11:367. [http://www.biomedcentral.com/1471-2105/11/367]

mishugeb commented 5 years ago

Thank you for your reply. I have installed sigminer and going to use that one.

PoisonAlien commented 5 years ago

Hello, You should consult the elbow plot generated during the signature extraction to judge the optimal number of signatures. The way maftools chooses the number of signatures is bit unsophisticated and its possible that the estimated number of signature might not be the optimal solution.

Just like k-means clustering, NMF too is an unsupervised method. Hence the number of signatures need to be carefully chosen. The elbow plot generated while extracting signatures helps to decide this number. The way to interpret the plot is to look at the point at which the cophenetic metric on y-axis reaches the maximum and drops afterwards with no significant change as the number of signatures increase (x-axis). In the example below plot it could be around 6-7.

cophenetic_metric_Rplot

Now in your case I think the plot might have looked something like this.

cophenetic_2

Here, I would guess the optimal number of signatures is 6 since the cophenetic metric drops sharply at 6 and there is minimal change afterwards. However, maftools chose 3 as the number of signatures since its the first point at which the value dropped.

I hope this was helpful to understand the results. My suggestion is to decide the number of signatures based on the elbow plot - and rerun the extractsignatures function with n argument. Let me know if you still have any questions.

clersdom commented 4 years ago

Hi, From the elbow plot below, would you use n=5 to extract signatures?

Cophenetic_Example.pdf Thanks!

PoisonAlien commented 4 years ago

It seems 3 is already good. Could you may be run it till 10 and check if the values drop after 6?

PoisonAlien commented 4 years ago

You should check for cophenetic cor. value. Run estimateSignatures for 10 signatures.

sigest = estimateSignatures(mat = laml.tnm, nTry = 10)
plotCophenetic(res = sigest)

Above figure should help you to decide number of signatures. Number of signatures depend on the mutation load, and sample size. For small cohort you would not expect lots of signatures.

clersdom commented 4 years ago

Hi, Yes - I realised later that I can use nTry in order to try more signatures. In this case, the values do not drop after 6 so I guess 3-5 signatures will be the best in this case. Would you recommend any other tool to validate these results? Many thanks!

ShixiangWang commented 4 years ago

Try sigminer (https://shixiangwang.github.io/sigminer-doc/, especially you can run the sig_fit) or sigprofiler (https://osf.io/t6j7u/wiki/home/) (sigprofiler is a gold standard tool).