YuLab-SMU / clusterProfiler

:bar_chart: A universal enrichment tool for interpreting omics data
https://yulab-smu.top/biomedical-knowledge-mining-book/
1.02k stars 256 forks source link

P.adjust values in dotplot do not match GO analysis values #612

Open lillianhorin opened 1 year ago

lillianhorin commented 1 year ago

Hello! Hope all is well.

I am using both the enrichGO and gseGO functions. The analyses are going well, but I am uncovering a phenomenon I quite honestly don't understand. I am getting low p.adjust values in the analysis, but the p.adjust with dotplot is much higher, even after sorting for p.adjust.

Here is the reproducible example.

library(clusterProfiler)
data(geneList, package="DOSE")
gene <- names(geneList)[abs(geneList) > 2]

ego <- enrichGO(gene          = gene,
                universe      = names(geneList),
                OrgDb         = org.Hs.eg.db,
                ont           = "CC",
                pAdjustMethod = "BH",
                pvalueCutoff  = 0.01,
                qvalueCutoff  = 0.05,
                readable      = TRUE)
head(ego)

image

Immediately upon examining head(ego), we see many p.adjust values as low as 5e-8. But when we plot this using a dotplot, these all seem to disappear. The lowest p.adjust value on the plot is much higher.

dotplot( ego, 
         x = "p.adjust",
         color = "p.adjust",
         showCategory = 20,
         orderBy="p.adjust",
         font.size = 8)

Graphical output: image

The discrepancy is even more noticeable if I sort by default GeneRatio, where values that are supposed to be in the e-8 range have much higher (more red) p.adjust.

image

I have this same issue with gseGO. It seems the issue is once it is being passed into the dotplot function.

Would love your insight on this - thank you so much!

guidohooiveld commented 1 year ago

I ran the example code you provided, and I indeed got identical results from the enrichGO() analysis:

> library(clusterProfiler)
> data(geneList, package="DOSE")
> gene <- names(geneList)[abs(geneList) > 2]
> 
> ego <- enrichGO(gene          = gene,
+                 universe      = names(geneList),
+                 OrgDb         = org.Hs.eg.db,
+                 ont           = "CC",
+                 pAdjustMethod = "BH",
+                 pvalueCutoff  = 0.01,
+                 qvalueCutoff  = 0.05,
+                 readable      = TRUE)
> head(ego)
                   ID                              Description GeneRatio
GO:0005819 GO:0005819                                  spindle    25/201
GO:0000775 GO:0000775           chromosome, centromeric region    19/201
GO:0072686 GO:0072686                          mitotic spindle    17/201
GO:0000779 GO:0000779 condensed chromosome, centromeric region    16/201
GO:0098687 GO:0098687                       chromosomal region    23/201
GO:0000793 GO:0000793                     condensed chromosome    19/201
             BgRatio       pvalue     p.adjust       qvalue
GO:0005819 329/11780 3.448666e-10 5.624574e-08 5.086717e-08
GO:0000775 189/11780 4.988612e-10 5.624574e-08 5.086717e-08
GO:0072686 149/11780 5.941452e-10 5.624574e-08 5.086717e-08
GO:0000779 137/11780 1.372680e-09 9.746031e-08 8.814053e-08
GO:0098687 308/11780 2.593769e-09 1.407498e-07 1.272904e-07
GO:0000793 210/11780 2.973586e-09 1.407498e-07 1.272904e-07
                                                                                                                                                        geneID
GO:0005819 CDCA8/CDC20/KIF23/CENPE/ASPM/DLGAP5/SKA1/NUSAP1/TPX2/TACC3/NEK2/CDK1/MAD2L1/KIF18A/KIF11/TRAT1/TTK/AURKB/PRC1/KIFC1/KIF18B/KIF20A/AURKA/CCNB1/KIF4A
GO:0000775                                      CDCA8/CENPE/NDC80/TOP2A/HJURP/SKA1/NEK2/CENPM/CENPN/ERCC6L/MAD2L1/KIF18A/CDT1/BIRC5/EZH2/TTK/NCAPG/AURKB/CCNB1
GO:0072686                                                KIF23/CENPE/ASPM/SKA1/NUSAP1/TPX2/TACC3/CDK1/MAD2L1/KIF18A/KIF11/TRAT1/AURKB/PRC1/KIFC1/KIF18B/AURKA
GO:0000779                                                       CENPE/NDC80/HJURP/SKA1/NEK2/CENPM/CENPN/ERCC6L/MAD2L1/KIF18A/CDT1/BIRC5/TTK/NCAPG/AURKB/CCNB1
GO:0098687             CDCA8/CENPE/NDC80/TOP2A/HJURP/SKA1/NEK2/CENPM/RAD51AP1/CENPN/CDK1/ERCC6L/MAD2L1/KIF18A/CDT1/BIRC5/EZH2/TTK/NCAPG/AURKB/CHEK1/CCNB1/MCM5
GO:0000793                                     CENPE/NDC80/TOP2A/NCAPH/HJURP/SKA1/NEK2/CENPM/CENPN/ERCC6L/MAD2L1/KIF18A/CDT1/BIRC5/TTK/NCAPG/AURKB/CHEK1/CCNB1
           Count
GO:0005819    25
GO:0000775    19
GO:0072686    17
GO:0000779    16
GO:0098687    23
GO:0000793    19
>

When I generated the dotplot with showCategory = 20, I also got the same plot (plot not shown).

Yet, when I limited the number of categories to 6 (which is the same number of rows head() returns), I got a plot that clearly matches the values presented in the table. In other words, by setting showCategory = 20 you create a graph that seems to be off (because of the color scale), but which actually isn't??? Please note that 'more red' means lower p.adjust, not higher!

> dotplot( ego, 
+          x = "p.adjust",
+          color = "p.adjust",
+          showCategory = 6,
+          orderBy="p.adjust",
+          font.size = 8)
>

image

lillianhorin commented 1 year ago

@guidohooiveld Thank you for your response! his is very helpful. So does that mean it is not recommended to use a showCategory above 6? Because even if it actually is correct but only seems to be off, the initial perception is everything when presenting the data.

mail2senthil commented 4 months ago

Did you solve the issue? I have the same problem.

guidohooiveld commented 4 months ago

What exactly is your problem? Consider opening a new issue. Be sure to provide sufficient details.

mail2senthil commented 4 months ago

I have the same problem as reported by [lillianhorin]. The p-value/adjp-value matches between the gsea result and dotplot when I set showCategory=6. I get conflicting values when I set any number above 6.

guidohooiveld commented 4 months ago

Could you please provide some reproducible code to illustrate the issue?

I am asking because to me it seems everything is correct (i.e. values visualized in the dotplot match with those in ego...)

> ## load required libraries
> library(clusterProfiler)
> library(org.Hs.eg.db)
> 
> ## load included example data, and select the 'relevant' genes.
> data(geneList, package="DOSE")
> gene <- names(geneList)[abs(geneList) > 2]
> 
> ## run enrichGO
> ego <- enrichGO(gene          = gene,
+                 universe      = names(geneList),
+                 OrgDb         = org.Hs.eg.db,
+                 ont           = "CC",
+                 pAdjustMethod = "BH",
+                 pvalueCutoff  = 0.01,
+                 qvalueCutoff  = 0.05,
+                 readable      = TRUE)
> 
> ## specifically extract p-, adjusted- and count
> ## for top.n gene sets
> 
> ## set top.n to 9
> top.n = 9
> 
> ego[1:top.n,c("pvalue","p.adjust", "Count")]
                 pvalue     p.adjust Count
GO:0000775 1.422125e-11 4.024615e-09    21
GO:0000779 3.719635e-11 5.138722e-09    18
GO:0072686 9.226677e-11 5.138722e-09    18
GO:0005819 9.727050e-11 5.138722e-09    26
GO:0000793 9.895898e-11 5.138722e-09    21
GO:0000776 1.191495e-10 5.138722e-09    17
GO:0098687 1.271062e-10 5.138722e-09    25
GO:0005876 1.035478e-07 3.663004e-06    10
GO:0005875 4.312652e-07 1.286932e-05    12
> 
> 
> dotplot( ego, 
+          x = "p.adjust",
+          color = "p.adjust",
+          showCategory = top.n,
+          orderBy="p.adjust",
+          font.size = 8)
> 

dotplot below matches values above....!

image

 > ## repeat, with e.g. top.n = 14
> top.n = 14
> ego[1:top.n,c("pvalue","p.adjust", "Count")]
                 pvalue     p.adjust Count
GO:0000775 1.422125e-11 4.024615e-09    21
GO:0000779 3.719635e-11 5.138722e-09    18
GO:0072686 9.226677e-11 5.138722e-09    18
GO:0005819 9.727050e-11 5.138722e-09    26
GO:0000793 9.895898e-11 5.138722e-09    21
GO:0000776 1.191495e-10 5.138722e-09    17
GO:0098687 1.271062e-10 5.138722e-09    25
GO:0005876 1.035478e-07 3.663004e-06    10
GO:0005875 4.312652e-07 1.286932e-05    12
GO:0030312 5.002210e-07 1.286932e-05    24
GO:0031012 5.002210e-07 1.286932e-05    24
GO:1990023 5.565242e-07 1.312470e-05     5
GO:0051233 8.312893e-07 1.809653e-05     7
GO:0005874 2.393495e-06 4.838279e-05    19
> 
> 
> dotplot( ego, 
+          x = "p.adjust",
+          color = "p.adjust",
+          showCategory = top.n,
+          orderBy="p.adjust",
+          font.size = 8)
> 
>

Again, dotplot below matches values above....!

image

guidohooiveld commented 4 months ago

OK, after re-reading your comment I noticed you are doing a GO-based GSEA analysis...

> res2 <- gseGO(geneList = geneList,
+                ont = "CC",
+                OrgDb = org.Hs.eg.db,
+                keyType = "ENTREZID",
+                minGSSize = 10,
+                maxGSSize = 500,
+                eps = 0,
+                pvalueCutoff = 0.05,
+                pAdjustMethod = "BH"
+                )
using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).

preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
> 
> ## 'Count' now refers to 'the number of core enriched genes'
> ## Add these to res2 data.frame.
> ## Number equals number of forward slashes + 1.
> library(stringr)
> res2@result$Count <- str_count(res2$core_enrichment,"\\/")  + 1
> 
> 
> 
> top.n = 9
> res2[1:top.n,c("Description","pvalue","p.adjust", "Count")]
                                        Description       pvalue     p.adjust Count
GO:0098687                       chromosomal region 7.090676e-23 4.750753e-20    88
GO:0000775           chromosome, centromeric region 1.014018e-20 3.396961e-18    43
GO:0000793                     condensed chromosome 4.510418e-18 1.007327e-15    46
GO:0000779 condensed chromosome, centromeric region 1.064529e-17 1.783086e-15    36
GO:0000228                       nuclear chromosome 3.200622e-17 4.288833e-15    59
GO:0030312         external encapsulating structure 2.071867e-16 1.983073e-14   159
GO:0031012                     extracellular matrix 2.071867e-16 1.983073e-14   159
GO:0000776                              kinetochore 3.294246e-16 2.468340e-14    31
GO:0062023 collagen-containing extracellular matrix 3.315680e-16 2.468340e-14   133
> 
> dotplot( res2, 
+          x = "p.adjust",
+          color = "p.adjust",
+          showCategory = top.n,
+          orderBy="p.adjust",
+          font.size = 8)
> 
> 

image

> top.n = 19
> res2[1:top.n,c("Description","pvalue","p.adjust", "Count")]
                                        Description       pvalue     p.adjust Count
GO:0098687                       chromosomal region 7.090676e-23 4.750753e-20    88
GO:0000775           chromosome, centromeric region 1.014018e-20 3.396961e-18    43
GO:0000793                     condensed chromosome 4.510418e-18 1.007327e-15    46
GO:0000779 condensed chromosome, centromeric region 1.064529e-17 1.783086e-15    36
GO:0000228                       nuclear chromosome 3.200622e-17 4.288833e-15    59
GO:0030312         external encapsulating structure 2.071867e-16 1.983073e-14   159
GO:0031012                     extracellular matrix 2.071867e-16 1.983073e-14   159
GO:0000776                              kinetochore 3.294246e-16 2.468340e-14    31
GO:0062023 collagen-containing extracellular matrix 3.315680e-16 2.468340e-14   133
GO:0005819                                  spindle 5.771008e-16 3.866575e-14    68
GO:0072686                          mitotic spindle 1.436490e-11 8.749531e-10    28
GO:0030496                                  midbody 1.033020e-08 5.767697e-07    19
GO:0051233                          spindle midzone 6.674388e-08 3.439877e-06    11
GO:0005604                        basement membrane 8.799662e-08 4.211267e-06    37
GO:0071162                              CMG complex 2.024650e-07 8.478222e-06    11
GO:0031261    DNA replication preinitiation complex 1.933702e-07 8.478222e-06    11
GO:0000940                        outer kinetochore 9.757268e-07 3.845512e-05    10
GO:0005874                              microtubule 1.423733e-06 5.299452e-05    54
GO:0005876                      spindle microtubule 2.721809e-06 9.597956e-05    13
> 
> dotplot( res2, 
+          x = "p.adjust",
+          color = "p.adjust",
+          showCategory = top.n,
+          orderBy="p.adjust",
+          font.size = 8)
> 
> 

image