Open Kyeongmi-Lee opened 6 months ago
Maybe you had found the function merge_result
from clusterProfiler
, but unfortunately that function doesn't work (yet?) with the results from compareCluster
.
One way of achieving you goal is using compareCluster
with the generic function enricher
on the combined gene sets from KEGG and GO.
Note that under the hood both enrichKEGG
and enrichGO
also use this generic function enricher
, but with these 2 functions the inputs are the conveniently automagically generated on-the-fly, so there is less preparatory 'work' needed from the user.
Please also note that the result of this combined approach is almost identical to the outcomes of the separate enrichKEGG
and enrichGO
analyses. Since a different number of genes and gene sets are being analyzed, the number of background genes resp. the number of multiple tests performed are different, resulting in slightly different (raw) pvalue
and BH-corrected p-values (p.adjust
). This is mainly the case for the results of the KEGG-based analysis, since in KEGG itselves only 8672 human genes have been annotated (and are used as background genes), and in the combined TERM2GENE
list there are 19189 (background) genes present (for GO-BP the difference in background genes is less; 18870).
Below some code.
> ## load libraries
> library(clusterProfiler)
> library(enrichplot)
> library(org.Hs.eg.db)
> library(GO.db)
>
> ## load sample data
> data(gcSample)
>
>
> #### A) separate analysis
>
> ## KEGG analysis
> res.kegg <- compareCluster(gcSample,
+ fun = "enrichKEGG",
+ pvalueCutoff=0.05,
+ pAdjustMethod = "BH",
+ minGSSize = 10,
+ maxGSSize = 500)
> res.kegg <- setReadable(res.kegg, OrgDb = org.Hs.eg.db, keyType="ENTREZID")
>
> res.kegg
#
# Result of Comparing 8 gene clusters
#
#.. @fun enrichKEGG
#.. @geneClusters List of 8
$ X1: chr [1:216] "4597" "7111" "5266" "2175" ...
$ X2: chr [1:805] "23450" "5160" "7126" "26118" ...
$ X3: chr [1:392] "894" "7057" "22906" "3339" ...
$ X4: chr [1:838] "5573" "7453" "5245" "23450" ...
$ X5: chr [1:929] "5982" "7318" "6352" "2101" ...
$ X6: chr [1:585] "5337" "9295" "4035" "811" ...
$ X7: chr [1:582] "2621" "2665" "5690" "3608" ...
$ X8: chr [1:237] "2665" "4735" "1327" "3192" ...
#...Result 'data.frame': 76 obs. of 12 variables:
$ Cluster : Factor w/ 8 levels "X1","X2","X3",..: 2 2 2 3 3 3 4 4 4 4 ...
$ category : chr "Human Diseases" "Human Diseases" "Cellular Processes" "Environmental Information Processing" ...
$ subcategory: chr "Infectious disease: viral" "Immune disease" "Cell growth and death" "Signaling molecules and interaction" ...
$ ID : chr "hsa05169" "hsa05340" "hsa04110" "hsa04512" ...
$ Description: chr "Epstein-Barr virus infection" "Primary immunodeficiency" "Cell cycle" "ECM-receptor interaction" ...
$ GeneRatio : chr "23/406" "8/406" "18/406" "9/193" ...
$ BgRatio : chr "202/8672" "38/8672" "157/8672" "89/8672" ...
$ pvalue : num 6.71e-05 3.05e-04 3.78e-04 1.52e-04 3.25e-04 ...
$ p.adjust : num 0.0211 0.0395 0.0395 0.0357 0.0357 ...
$ qvalue : num 0.0196 0.0368 0.0368 0.0331 0.0331 ...
$ geneID : chr "LYN/ICAM1/TNFAIP3/E2F1/CCNA2/E2F3/BAK1/RUNX3/BID/IKBKE/TAP2/FAS/CCNE2/RELB/CD3E/ENTPD3/SYK/TRAF3/IKBKB/CD247/NEDD4/CD40/STAT1" "ADA/TAP2/LCK/CD79A/CD3E/CD8A/CD40/DCLRE1C" "CDC20/E2F1/CCNA2/E2F3/BUB1B/CDC6/DBF4/PKMYT1/CDC7/ESPL1/CCNE2/CDKN2A/SFN/BUB1/CHEK2/ORC6/CDC14B/MCM4" "THBS1/HSPG2/COL9A3/ITGB7/CHAD/ITGA7/LAMA4/ITGB8/ITGB5" ...
$ Count : int 23 8 18 9 17 19 17 10 22 19 ...
#.. number of enriched terms found for each gene cluster:
#.. X1: 0
#.. X2: 3
#.. X3: 3
#.. X4: 22
#.. X5: 10
#.. X6: 1
#.. X7: 17
#.. X8: 20
#
#...Citation
T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou,
W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu.
clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.
The Innovation. 2021, 2(3):100141
>
> ## Gene Ontology, Biological Process
>
> res.go <- compareCluster(gcSample,
+ fun = "enrichGO",
+ OrgDb = 'org.Hs.eg.db',
+ ont = "BP",
+ readable = TRUE,
+ pvalueCutoff=0.05,
+ pAdjustMethod = "BH",
+ minGSSize = 10,
+ maxGSSize = 500)
> res.go <- pairwise_termsim(res.go)
>
> res.go
#
# Result of Comparing 8 gene clusters
#
#.. @fun enrichGO
#.. @geneClusters List of 8
$ X1: chr [1:216] "4597" "7111" "5266" "2175" ...
$ X2: chr [1:805] "23450" "5160" "7126" "26118" ...
$ X3: chr [1:392] "894" "7057" "22906" "3339" ...
$ X4: chr [1:838] "5573" "7453" "5245" "23450" ...
$ X5: chr [1:929] "5982" "7318" "6352" "2101" ...
$ X6: chr [1:585] "5337" "9295" "4035" "811" ...
$ X7: chr [1:582] "2621" "2665" "5690" "3608" ...
$ X8: chr [1:237] "2665" "4735" "1327" "3192" ...
#...Result 'data.frame': 1002 obs. of 10 variables:
$ Cluster : Factor w/ 8 levels "X1","X2","X3",..: 1 1 1 1 1 1 1 1 1 1 ...
$ ID : chr "GO:0021978" "GO:0086005" "GO:0086091" "GO:1990266" ...
$ Description: chr "telencephalon regionalization" "ventricular cardiac muscle cell action potential" "regulation of heart rate by cardiac conduction" "neutrophil migration" ...
$ GeneRatio : chr "4/198" "5/198" "5/198" "8/198" ...
$ BgRatio : chr "13/18870" "35/18870" "38/18870" "129/18870" ...
$ pvalue : num 7.81e-06 3.04e-05 4.58e-05 6.58e-05 6.59e-05 ...
$ p.adjust : num 0.0207 0.0322 0.0322 0.0322 0.0322 ...
$ qvalue : num 0.0191 0.0297 0.0297 0.0297 0.0297 ...
$ geneID : chr "PAX6/LHX2/EMX1/EMX2" "SCN3B/KCNH2/KCND3/KCNE5/CTNNA3" "SCN3B/KCNH2/KCND3/KCNE5/CTNNA3" "CCL20/PLA2G1B/CXCL3/FUT7/CXCL8/GP2/S100A8/CXCL5" ...
$ Count : int 4 5 5 8 9 7 4 3 12 9 ...
#.. number of enriched terms found for each gene cluster:
#.. X1: 15
#.. X2: 290
#.. X3: 84
#.. X4: 97
#.. X5: 111
#.. X6: 97
#.. X7: 138
#.. X8: 170
#
#...Citation
T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou,
W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu.
clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.
The Innovation. 2021, 2(3):100141
>
>
>
>
> #### B) combined analysis
>
> ## create TERM2GENE and TERM2NAME for KEGG
> kegg.data <- download_KEGG(species="hsa", keggType = "KEGG", keyType = "kegg")
> term2gene.kegg <- kegg.data$KEGGPATHID2EXTID
> term2name.kegg <- kegg.data$KEGGPATHID2NAME
>
> head(term2gene.kegg)
from to
1 hsa00010 10327
2 hsa00010 124
3 hsa00010 125
4 hsa00010 126
5 hsa00010 127
6 hsa00010 128
> head(term2name.kegg)
from to
1 hsa01100 Metabolic pathways
2 hsa01200 Carbon metabolism
3 hsa01210 2-Oxocarboxylic acid metabolism
4 hsa01212 Fatty acid metabolism
5 hsa01230 Biosynthesis of amino acids
6 hsa01232 Nucleotide metabolism
>
> ## create TERM2GENE and TERM2NAME for GO
> ## code is based on clusterProfiler code
> ## https://github.com/YuLab-SMU/clusterProfiler/blob/b24be8c372f947ed79f8f42c1a7f6624c14fb125/R/enrichGO.R#L156
>
>
> ## set GO category to be analyzed,
> ##i.e. 'BP', 'CC', 'MF', or 'ALL'
>
> ont <- "BP" ##limit sets to GO-BP
>
> goterms <- AnnotationDbi::Ontology(GO.db::GOTERM)
> if (ont != "ALL") {goterms <- goterms[goterms == ont]}
>
> term2gene.go <- AnnotationDbi::mapIds(org.Hs.eg.db, keys=names(goterms),
+ column="ENTREZID", keytype="GOALL", multiVals='list')
'select()' returned 1:many mapping between keys and columns
> term2gene.go <- stack(term2gene.go)
> term2gene.go <- term2gene.go[!is.na(term2gene.go[,"values"]),] [,c(2,1)]
> colnames(term2gene.go) <- c("from","to")
>
> term2name.go <- Term(GOTERM)[ names(goterms) ]
> term2name.go <- data.frame("from"=names(term2name.go), "to"=term2name.go)
>
> head(term2gene.go)
from to
2 GO:0000002 142
3 GO:0000002 291
4 GO:0000002 1763
5 GO:0000002 1890
6 GO:0000002 2021
7 GO:0000002 3980
> head(term2name.go)
from to
GO:0000001 GO:0000001 mitochondrion inheritance
GO:0000002 GO:0000002 mitochondrial genome maintenance
GO:0000003 GO:0000003 reproduction
GO:0000011 GO:0000011 vacuole inheritance
GO:0000012 GO:0000012 single strand break repair
GO:0000017 GO:0000017 alpha-glucoside transport
>
>
> ## combine the KEGG and GO TERM2GENE and TERM2NAME
> TERM2GENE <- rbind(term2gene.kegg, term2gene.go)
> TERM2NAME <- rbind(term2name.kegg, term2name.go)
>
>
>
> ## run compareCluster with generic function enricher, and the combined TERM2GENE and TERM2NAME
> res.combined <- compareCluster(geneClusters = gcSample,
+ fun = "enricher",
+ minGSSize = 10,
+ maxGSSize = 500,
+ pvalueCutoff = 0.05,
+ pAdjustMethod = "BH",
+ TERM2GENE = TERM2GENE,
+ TERM2NAME = TERM2NAME
+ )
> res.combined <- setReadable(res.combined, OrgDb = org.Hs.eg.db, keyType="ENTREZID")
>
> ## check
> res.combined
#
# Result of Comparing 8 gene clusters
#
#.. @fun enricher
#.. @geneClusters List of 8
$ X1: chr [1:216] "4597" "7111" "5266" "2175" ...
$ X2: chr [1:805] "23450" "5160" "7126" "26118" ...
$ X3: chr [1:392] "894" "7057" "22906" "3339" ...
$ X4: chr [1:838] "5573" "7453" "5245" "23450" ...
$ X5: chr [1:929] "5982" "7318" "6352" "2101" ...
$ X6: chr [1:585] "5337" "9295" "4035" "811" ...
$ X7: chr [1:582] "2621" "2665" "5690" "3608" ...
$ X8: chr [1:237] "2665" "4735" "1327" "3192" ...
#...Result 'data.frame': 1225 obs. of 10 variables:
$ Cluster : Factor w/ 8 levels "X1","X2","X3",..: 1 1 1 1 1 1 1 1 1 1 ...
$ ID : chr "GO:0021978" "GO:0086005" "GO:0086091" "GO:0061351" ...
$ Description: chr "telencephalon regionalization" "ventricular cardiac muscle cell action potential" "regulation of heart rate by cardiac conduction" "neural precursor cell proliferation" ...
$ GeneRatio : chr "4/198" "5/198" "5/198" "9/198" ...
$ BgRatio : chr "13/19189" "35/19189" "38/19189" "166/19189" ...
$ pvalue : num 7.31e-06 2.81e-05 4.23e-05 5.80e-05 5.86e-05 ...
$ p.adjust : num 0.0208 0.0312 0.0312 0.0312 0.0312 ...
$ qvalue : num 0.0191 0.0286 0.0286 0.0286 0.0286 ...
$ geneID : chr "PAX6/LHX2/EMX1/EMX2" "SCN3B/KCNH2/KCND3/KCNE5/CTNNA3" "SCN3B/KCNH2/KCND3/KCNE5/CTNNA3" "NF2/PAX6/LHX2/FZD9/LHX5/EPHB1/EMX1/LEF1/EMX2" ...
$ Count : int 4 5 5 9 8 7 4 3 12 9 ...
#.. number of enriched terms found for each gene cluster:
#.. X1: 15
#.. X2: 317
#.. X3: 98
#.. X4: 164
#.. X5: 145
#.. X6: 102
#.. X7: 175
#.. X8: 209
#
#...Citation
T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou,
W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu.
clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.
The Innovation. 2021, 2(3):100141
>
>
> dotplot(res.combined, font.size=8)
>
> ## compare results separate and combined runs
>
> as.data.frame(res.kegg)[as.data.frame(res.kegg)$ID == "hsa05169",]
Cluster category subcategory ID
1 X2 Human Diseases Infectious disease: viral hsa05169
Description GeneRatio BgRatio pvalue p.adjust
1 Epstein-Barr virus infection 23/406 202/8672 6.707389e-05 0.0210612
qvalue
1 0.01962794
geneID
1 LYN/ICAM1/TNFAIP3/E2F1/CCNA2/E2F3/BAK1/RUNX3/BID/IKBKE/TAP2/FAS/CCNE2/RELB/CD3E/ENTPD3/SYK/TRAF3/IKBKB/CD247/NEDD4/CD40/STAT1
Count
1 23
> as.data.frame(res.go)[as.data.frame(res.go)$ID == "GO:0021978",]
Cluster ID Description GeneRatio BgRatio
1 X1 GO:0021978 telencephalon regionalization 4/198 13/18870
pvalue p.adjust qvalue geneID Count
1 7.807836e-06 0.02071419 0.01910043 PAX6/LHX2/EMX1/EMX2 4
>
> as.data.frame(res.combined)[as.data.frame(res.combined)$ID == "hsa05169",]
Cluster ID Description GeneRatio BgRatio
33 X2 hsa05169 Epstein-Barr virus infection 23/760 202/19189
735 X5 hsa05169 Epstein-Barr virus infection 20/891 202/19189
952 X7 hsa05169 Epstein-Barr virus infection 16/563 202/19189
pvalue p.adjust qvalue
33 5.530688e-06 0.001653983 0.001372322
735 1.193721e-03 0.046538197 0.039817444
952 3.191288e-04 0.013958290 0.012162907
geneID
33 LYN/ICAM1/TNFAIP3/E2F1/CCNA2/E2F3/BAK1/RUNX3/BID/IKBKE/TAP2/FAS/CCNE2/RELB/CD3E/ENTPD3/SYK/TRAF3/IKBKB/CD247/NEDD4/CD40/STAT1
735 LYN/ICAM1/RB1/SKP2/E2F3/PIK3CD/NFKBIE/CASP9/EIF2AK2/CDK2/PLCG2/FAS/IFNAR2/SAP30/STAT1/MAPK13/AKT3/CD3D/MAVS/CD40
952 PSMD8/PSMD2/NFKBIA/IRAK1/HDAC2/CDK4/SEM1/HES1/CXCL10/ISG15/NCOR2/CCND1/PSMD11/HLA-G/PSMD14/STAT1
Count
33 23
735 20
952 16
> as.data.frame(res.combined)[as.data.frame(res.combined)$ID == "GO:0021978",]
Cluster ID Description GeneRatio BgRatio
1 X1 GO:0021978 telencephalon regionalization 4/198 13/19189
pvalue p.adjust qvalue geneID Count
1 7.310454e-06 0.02082748 0.01911491 PAX6/LHX2/EMX1/EMX2 4
>
>
i have compareCluster enrichKEGG and enrichGO results with same input. and i want to show 2 different results in one dotplot. how to merge multiple compareCluster results?