Closed mevers closed 8 years ago
can you send a sample data that can reproduce this issue to gcyu@hku.hk?
I sent you a sample file & data to your email address. Thanks, Maurits
This is due to your input geneList
is not sorted.
geneList <- sort(geneList, decreasing = T)
will fix the issue.
I have updated the source code, so that if user input un-sorted geneList
, it will stop and complain, see https://github.com/GuangchuangYu/DOSE/commit/5a7bc077d72e03d451a02b197f01bf3905431a02.
Dear Guangchuang.
Thanks for the quick update. Yes, this seems to fix the issue with the constant p-values. There still remains the issue with the discontinuities in the phenotype correlation plots. See e.g. term GO:0000070 (attached). Any advice?
Thanks, Maurits
On Tue, Sep 22, 2015 at 1:45 PM, Guangchuang Yu notifications@github.com wrote:
Closed #25 https://github.com/GuangchuangYu/clusterProfiler/issues/25.
— Reply to this email directly or view it on GitHub https://github.com/GuangchuangYu/clusterProfiler/issues/25#event-415464130 .
Your geneList
is weird with many values identical. This maybe the reason.
> table(geneList) %>% as.data.frame %>% subset(., Freq > 500)
geneList Freq
54 0.86 526
56 0.88 534
58 0.9 584
60 0.92 570
62 0.94 528
66 0.98 522
68 1 506
Dear Guangchuang.
I'm sorry but that is a very poor excuse. The discontinuity of the phenotype correlation plots suggests to me that this is a numerical issue in your code. It looks like a branch cut in the function you use to plot the phenotype correlation curves that occurs if the values of the ranking metric are not distributed around zero.
Broad's GSEA-P does not seem to have this issue, so it is definitely not an issue with the data.
Best, Maurits
The weird thing is I couldn't replicate your issue. It shouldn't happened if your input geneList
was sorted.
Can you save the object, res.GSEA.GO
, to rda
file and send to me?
Dear Guangchuang.
I did some more testing, and indeed the discontinuities in the phenotype plots have disappeared, following sorting of the ranked gene list. So all is well. Thanks for looking into this and your help.
Best regards, Maurits
Dear Guangchuang.
I have come across two issues, maybe you can clarify.
I perform a GSEA analysis within clusterProfiler using
res.GSEA.GO<-gseGO(geneList = geneList, organism = "human", exponent = 1, ont = "BP", nPerm = 1000, minGSSize = 15, pvalueCutoff = 0.01, verbose = TRUE);
1.) The resulting table of GO terms all seem to have the same p-values, adjusted p-values, and q-value. For example notice the entries in the last column (qvalues = 0.00470813780684201) :
The results are similar for other ontologies.
2.) All GSEA plots seem to have a discontinuity in the the "phenotype" curves. See e.g. here http://imgur.com/AcH49Bh .
Any help in resolving these issues would be greatly appreciated.
Best, Maurits