MikeJSeo / SAM

SAM (Significance Analysis of Microarrays) shiny app
https://cran.r-project.org/web/packages/samr/index.html
37 stars 20 forks source link

Can SAM do post hoc test in multiple group analysis? #39

Open xgt1986627 opened 5 years ago

xgt1986627 commented 5 years ago

Can SAM do post hoc test in multiple group analysis? Tukey or SNK, et al.

SamGG commented 5 years ago

Hi, There are some discussions about this question in Bioconconductor' forum. To make the story short, Smyth & Lun are not in favor and rather propose to do independent FDR correction for each the contrast. This means that if you have 3 classes A, B, C, then you specify that you are interested in all differences A-C, B-C, A-B and then you compute a multiple tests correction for each those 3 contrasts. The alternatives are to merge all them before the correction (what is called global in limma) or to do a hierarchical approach. I was asked myself this question and it was good to think about how I would apprehend the solution. In fact, the question is coping with 2-dimensional multiple tests. Because FDR handle many tests at once (typically across genes, ie rows) and Tukey approach handle multiple tests across groups (ie columns) within a single gene. So we can compute the individual p-values of all contrasts then either compute q-value per contrast (limma's default) or q-value after having pooled them all (limma's global). We can get the p-value of an ANOVA on each line, then apply a FDR filter, then apply Tukey-like on each remaining row (sounds like limma's hierarchical, but too conservative IMHO). What else? I don't know. I think this came to the conclusion that the simple solution is to used contrasts separately. But in fact, I didn't look at the code of the great Rob. Tibshirani to know his point of view. Whatever he did, I think he did it well. HTH

xgt1986627 commented 5 years ago

Dear SamGG: Thank you very much for your such comprehensive explanation!! However I can't understand some of your details and I want to ask further.

First, you said:"So we can compute the individual p-values of all contrasts then either compute q-value per contrast (limma's default)." As your intruction, FDR handle many tests at row and Tukey at columns, but in the limma's default I think it only do FDR using BH adjustion method as default and doesn't do Tukey in the multiple contrasts. The code I used is: ->topTableOutput <- topTable(fit2,coef = 1, adjust="BH", n = Inf) Am I right? Or it has been done Tukey in limma?

Second, you said:”to merge all them before the correction (what is called global in limma)”. I thought I had got the global table by the code: ->write.table(fit2,"limma.txt") If this is what you said the pull pooled them all, should I need to do both FDR and Tukey to these p values in this table? Is the column “fit2$F.p.value” has been done any adjustment like FDR or Tukey?

Furthermore, in this global table I got the “fit2$F.p.value” of every gene in every group like the one-way ANOVA. You see in an ordinary ANOVA analysis, I can only get this column and there isn’t every p value before post-hoc has been down. So I don’t know whether the “fit2$F.p.value” has been down any multiple tests, as well as whether you said could be taken on other multiple way analysis except limma (like ANOVA or SAM, et al.), Because only one value for a total contrast existed, like ABC and there isn’t any value A-B,B-C,A-C. In this situation, we can’t get the answer by pull pooled them all isn’t it? So we can only use contrasts separately and do 2-dimensional multiple tests.

Third, I want to know are the p-values in “limma global” of the 3 group contrast (more than 2 group) comes from the 3 group like ANOVA or direct 3 separate contrasts like t-test? If it is the former, it doesn’t need Tukey further, isn’t it? If it is the latter, it needs 2-dimensional multiple tests. I would like to know, if the 2-dimensional multiple rule is universal to all multiple group analysis? If it is like this, we can do the 2-dimensional after multiple to separate contrasts and get the results of all differences like you said.

Finally, do you know the independent code to FDR and Tukey code which fit for the common statistic analysis? If yes, is it means the problem has been resolved?

Thank you again for all your kindly help!!!

Sincerely,

Minzhi Zhao

SamGG commented 5 years ago

Dear Minzhi, I was talking about the decideTest function. Look at a discussion at https://support.bioconductor.org/p/120072/#120105 that was very instructive for me. The code at https://github.com/cran/limma/blob/master/R/decidetests.R was also interesting to understand what's behind the scene and what the options. If you go with limma, you should ask your initial question at https://support.bioconductor.org after having checked the long limma documentation. Best, Samuel

xgt1986627 commented 5 years ago

Dear Samuel, Thanks for your detailed information! After red the pages you suggested carefully, I found myself more confused. But I think your description is more clear. This is realy a huge job and I understand you want to see a published paper. Now I want to focus on one tip"false discovery rate control at the gene level or at the contrast level". As we know the adjustion of p value (at the gene level) have some method like BH or permutation (q value?) based FDR. All these can be taken only with the existing of P value it self. But at the contrast level, also called post-hoc test, like Tukey, can such adjustion done independent with the base analysis, like two group t-test? I found this test must have the original expression value of each contrast. In your discussion I found all of you use t-test directly to describe such post-hoc test. There are other post-hoc tests like LSD, Bonferroni. I want to find a adjustion method which could be fit for all test method or algorithm, not only for ANOVA and t-test (in my understanding limma is also based on t-test?). For example, the method named "edge" which using optimal discovery procedure (ODP) frame work as likelihood ratio test in t-test and ANOVA. So it is better to indenpent with the original test method. Do you know such method? And do you know the essential difference of the multiple tests between "at the gene level" and "at the contrast level" ? Thank you again for all your kindly help!!

Sincerely,

Minzhi Zhao

xgt1986627 commented 5 years ago

Dear Samuel, I find a tool named "Perseus" for proteomic data analysis. And it's process for ANOVA and post-goc test is a bit like your said "hierarchical". I don't know whether it is your interest and I paste the paper link below. I think it should be a common question because we usually do many multiple group contrasts.

https://link.springer.com/protocol/10.1007%2F978-1-4939-7493-1_7#Sec16