Open lillianhorin opened 1 year ago
I ran the example code you provided, and I indeed got identical results from the enrichGO()
analysis:
> library(clusterProfiler)
> data(geneList, package="DOSE")
> gene <- names(geneList)[abs(geneList) > 2]
>
> ego <- enrichGO(gene = gene,
+ universe = names(geneList),
+ OrgDb = org.Hs.eg.db,
+ ont = "CC",
+ pAdjustMethod = "BH",
+ pvalueCutoff = 0.01,
+ qvalueCutoff = 0.05,
+ readable = TRUE)
> head(ego)
ID Description GeneRatio
GO:0005819 GO:0005819 spindle 25/201
GO:0000775 GO:0000775 chromosome, centromeric region 19/201
GO:0072686 GO:0072686 mitotic spindle 17/201
GO:0000779 GO:0000779 condensed chromosome, centromeric region 16/201
GO:0098687 GO:0098687 chromosomal region 23/201
GO:0000793 GO:0000793 condensed chromosome 19/201
BgRatio pvalue p.adjust qvalue
GO:0005819 329/11780 3.448666e-10 5.624574e-08 5.086717e-08
GO:0000775 189/11780 4.988612e-10 5.624574e-08 5.086717e-08
GO:0072686 149/11780 5.941452e-10 5.624574e-08 5.086717e-08
GO:0000779 137/11780 1.372680e-09 9.746031e-08 8.814053e-08
GO:0098687 308/11780 2.593769e-09 1.407498e-07 1.272904e-07
GO:0000793 210/11780 2.973586e-09 1.407498e-07 1.272904e-07
geneID
GO:0005819 CDCA8/CDC20/KIF23/CENPE/ASPM/DLGAP5/SKA1/NUSAP1/TPX2/TACC3/NEK2/CDK1/MAD2L1/KIF18A/KIF11/TRAT1/TTK/AURKB/PRC1/KIFC1/KIF18B/KIF20A/AURKA/CCNB1/KIF4A
GO:0000775 CDCA8/CENPE/NDC80/TOP2A/HJURP/SKA1/NEK2/CENPM/CENPN/ERCC6L/MAD2L1/KIF18A/CDT1/BIRC5/EZH2/TTK/NCAPG/AURKB/CCNB1
GO:0072686 KIF23/CENPE/ASPM/SKA1/NUSAP1/TPX2/TACC3/CDK1/MAD2L1/KIF18A/KIF11/TRAT1/AURKB/PRC1/KIFC1/KIF18B/AURKA
GO:0000779 CENPE/NDC80/HJURP/SKA1/NEK2/CENPM/CENPN/ERCC6L/MAD2L1/KIF18A/CDT1/BIRC5/TTK/NCAPG/AURKB/CCNB1
GO:0098687 CDCA8/CENPE/NDC80/TOP2A/HJURP/SKA1/NEK2/CENPM/RAD51AP1/CENPN/CDK1/ERCC6L/MAD2L1/KIF18A/CDT1/BIRC5/EZH2/TTK/NCAPG/AURKB/CHEK1/CCNB1/MCM5
GO:0000793 CENPE/NDC80/TOP2A/NCAPH/HJURP/SKA1/NEK2/CENPM/CENPN/ERCC6L/MAD2L1/KIF18A/CDT1/BIRC5/TTK/NCAPG/AURKB/CHEK1/CCNB1
Count
GO:0005819 25
GO:0000775 19
GO:0072686 17
GO:0000779 16
GO:0098687 23
GO:0000793 19
>
When I generated the dotplot with showCategory = 20
, I also got the same plot (plot not shown).
Yet, when I limited the number of categories to 6 (which is the same number of rows head()
returns), I got a plot that clearly matches the values presented in the table.
In other words, by setting showCategory = 20
you create a graph that seems to be off (because of the color scale), but which actually isn't???
Please note that 'more red' means lower p.adjust
, not higher!
> dotplot( ego,
+ x = "p.adjust",
+ color = "p.adjust",
+ showCategory = 6,
+ orderBy="p.adjust",
+ font.size = 8)
>
@guidohooiveld Thank you for your response! his is very helpful. So does that mean it is not recommended to use a showCategory
above 6
? Because even if it actually is correct but only seems to be off, the initial perception is everything when presenting the data.
Did you solve the issue? I have the same problem.
What exactly is your problem? Consider opening a new issue. Be sure to provide sufficient details.
I have the same problem as reported by [lillianhorin]. The p-value/adjp-value matches between the gsea result and dotplot when I set showCategory=6. I get conflicting values when I set any number above 6.
Could you please provide some reproducible code to illustrate the issue?
I am asking because to me it seems everything is correct (i.e. values visualized in the dotplot
match with those in ego
...)
> ## load required libraries
> library(clusterProfiler)
> library(org.Hs.eg.db)
>
> ## load included example data, and select the 'relevant' genes.
> data(geneList, package="DOSE")
> gene <- names(geneList)[abs(geneList) > 2]
>
> ## run enrichGO
> ego <- enrichGO(gene = gene,
+ universe = names(geneList),
+ OrgDb = org.Hs.eg.db,
+ ont = "CC",
+ pAdjustMethod = "BH",
+ pvalueCutoff = 0.01,
+ qvalueCutoff = 0.05,
+ readable = TRUE)
>
> ## specifically extract p-, adjusted- and count
> ## for top.n gene sets
>
> ## set top.n to 9
> top.n = 9
>
> ego[1:top.n,c("pvalue","p.adjust", "Count")]
pvalue p.adjust Count
GO:0000775 1.422125e-11 4.024615e-09 21
GO:0000779 3.719635e-11 5.138722e-09 18
GO:0072686 9.226677e-11 5.138722e-09 18
GO:0005819 9.727050e-11 5.138722e-09 26
GO:0000793 9.895898e-11 5.138722e-09 21
GO:0000776 1.191495e-10 5.138722e-09 17
GO:0098687 1.271062e-10 5.138722e-09 25
GO:0005876 1.035478e-07 3.663004e-06 10
GO:0005875 4.312652e-07 1.286932e-05 12
>
>
> dotplot( ego,
+ x = "p.adjust",
+ color = "p.adjust",
+ showCategory = top.n,
+ orderBy="p.adjust",
+ font.size = 8)
>
dotplot
below matches values above....!
> ## repeat, with e.g. top.n = 14
> top.n = 14
> ego[1:top.n,c("pvalue","p.adjust", "Count")]
pvalue p.adjust Count
GO:0000775 1.422125e-11 4.024615e-09 21
GO:0000779 3.719635e-11 5.138722e-09 18
GO:0072686 9.226677e-11 5.138722e-09 18
GO:0005819 9.727050e-11 5.138722e-09 26
GO:0000793 9.895898e-11 5.138722e-09 21
GO:0000776 1.191495e-10 5.138722e-09 17
GO:0098687 1.271062e-10 5.138722e-09 25
GO:0005876 1.035478e-07 3.663004e-06 10
GO:0005875 4.312652e-07 1.286932e-05 12
GO:0030312 5.002210e-07 1.286932e-05 24
GO:0031012 5.002210e-07 1.286932e-05 24
GO:1990023 5.565242e-07 1.312470e-05 5
GO:0051233 8.312893e-07 1.809653e-05 7
GO:0005874 2.393495e-06 4.838279e-05 19
>
>
> dotplot( ego,
+ x = "p.adjust",
+ color = "p.adjust",
+ showCategory = top.n,
+ orderBy="p.adjust",
+ font.size = 8)
>
>
Again, dotplot
below matches values above....!
OK, after re-reading your comment I noticed you are doing a GO-based GSEA analysis...
> res2 <- gseGO(geneList = geneList,
+ ont = "CC",
+ OrgDb = org.Hs.eg.db,
+ keyType = "ENTREZID",
+ minGSSize = 10,
+ maxGSSize = 500,
+ eps = 0,
+ pvalueCutoff = 0.05,
+ pAdjustMethod = "BH"
+ )
using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).
preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
>
> ## 'Count' now refers to 'the number of core enriched genes'
> ## Add these to res2 data.frame.
> ## Number equals number of forward slashes + 1.
> library(stringr)
> res2@result$Count <- str_count(res2$core_enrichment,"\\/") + 1
>
>
>
> top.n = 9
> res2[1:top.n,c("Description","pvalue","p.adjust", "Count")]
Description pvalue p.adjust Count
GO:0098687 chromosomal region 7.090676e-23 4.750753e-20 88
GO:0000775 chromosome, centromeric region 1.014018e-20 3.396961e-18 43
GO:0000793 condensed chromosome 4.510418e-18 1.007327e-15 46
GO:0000779 condensed chromosome, centromeric region 1.064529e-17 1.783086e-15 36
GO:0000228 nuclear chromosome 3.200622e-17 4.288833e-15 59
GO:0030312 external encapsulating structure 2.071867e-16 1.983073e-14 159
GO:0031012 extracellular matrix 2.071867e-16 1.983073e-14 159
GO:0000776 kinetochore 3.294246e-16 2.468340e-14 31
GO:0062023 collagen-containing extracellular matrix 3.315680e-16 2.468340e-14 133
>
> dotplot( res2,
+ x = "p.adjust",
+ color = "p.adjust",
+ showCategory = top.n,
+ orderBy="p.adjust",
+ font.size = 8)
>
>
> top.n = 19
> res2[1:top.n,c("Description","pvalue","p.adjust", "Count")]
Description pvalue p.adjust Count
GO:0098687 chromosomal region 7.090676e-23 4.750753e-20 88
GO:0000775 chromosome, centromeric region 1.014018e-20 3.396961e-18 43
GO:0000793 condensed chromosome 4.510418e-18 1.007327e-15 46
GO:0000779 condensed chromosome, centromeric region 1.064529e-17 1.783086e-15 36
GO:0000228 nuclear chromosome 3.200622e-17 4.288833e-15 59
GO:0030312 external encapsulating structure 2.071867e-16 1.983073e-14 159
GO:0031012 extracellular matrix 2.071867e-16 1.983073e-14 159
GO:0000776 kinetochore 3.294246e-16 2.468340e-14 31
GO:0062023 collagen-containing extracellular matrix 3.315680e-16 2.468340e-14 133
GO:0005819 spindle 5.771008e-16 3.866575e-14 68
GO:0072686 mitotic spindle 1.436490e-11 8.749531e-10 28
GO:0030496 midbody 1.033020e-08 5.767697e-07 19
GO:0051233 spindle midzone 6.674388e-08 3.439877e-06 11
GO:0005604 basement membrane 8.799662e-08 4.211267e-06 37
GO:0071162 CMG complex 2.024650e-07 8.478222e-06 11
GO:0031261 DNA replication preinitiation complex 1.933702e-07 8.478222e-06 11
GO:0000940 outer kinetochore 9.757268e-07 3.845512e-05 10
GO:0005874 microtubule 1.423733e-06 5.299452e-05 54
GO:0005876 spindle microtubule 2.721809e-06 9.597956e-05 13
>
> dotplot( res2,
+ x = "p.adjust",
+ color = "p.adjust",
+ showCategory = top.n,
+ orderBy="p.adjust",
+ font.size = 8)
>
>
Hello! Hope all is well.
I am using both the
enrichGO
andgseGO
functions. The analyses are going well, but I am uncovering a phenomenon I quite honestly don't understand. I am getting lowp.adjust
values in the analysis, but thep.adjust
withdotplot
is much higher, even after sorting forp.adjust
.Here is the reproducible example.
Immediately upon examining
head(ego)
, we see manyp.adjust
values as low as 5e-8. But when we plot this using adotplot
, these all seem to disappear. The lowestp.adjust
value on the plot is much higher.Graphical output:
The discrepancy is even more noticeable if I sort by default
GeneRatio
, where values that are supposed to be in the e-8 range have much higher (more red)p.adjust.
I have this same issue with
gseGO.
It seems the issue is once it is being passed into thedotplot
function.Would love your insight on this - thank you so much!