YuLab-SMU / clusterProfiler

:bar_chart: A universal enrichment tool for interpreting omics data
https://yulab-smu.top/biomedical-knowledge-mining-book/
1.01k stars 253 forks source link

How to split the results of compareCluster? #677

Open Ldec12 opened 6 months ago

Ldec12 commented 6 months ago

The geneList contains three groups. After running the following code GSEA_GO <- compareCluster(geneList, fun="GSEA",TERM2GENE=m_df, eps = 0, pvalueCutoff=0.2), how can I split GSEA_GO into three groups according to the original grouping?” Thanks

guidohooiveld commented 6 months ago

It is not clear to me what exactly you try to achieve!

If you would like to create a dotplot grouped according to the 3 input lists, you should use the argumentsplit=".sign" together with calling the function facet_grid as well; thus:

dotplot(GSEA_GO, showCategory=10, split=".sign") + facet_grid(.~.sign)

I agree that this is not well documented!

> library(clusterProfiler)
> library(enrichplot)
>  
> library(org.Hs.eg.db)
>  
> data(geneList, package="DOSE")
> inputList <- list(GeneList1 = geneList,
+                   GeneList2 = geneList,
+                   GeneList3 = rev(-1*geneList) )  # reverse order
> 
> ## compareCluster-GSEA
> xx <- compareCluster(geneClusters=inputList, fun = "gseGO",
+              OrgDb = org.Hs.eg.db, keyType = "ENTREZID",
+              ont = "BP", eps = 0, pvalueCutoff = 0.05,
+              pAdjustMethod = "none", minGSSize = 15, maxGSSize = 500)
> xx <- enrichplot::pairwise_termsim(xx) 
> xx <- setReadable(xx, 'org.Hs.eg.db', 'ENTREZID')
> 
> 
> p <- dotplot(xx, font.size=8, showCategory=8, title =("GSEA results"), split=".sign") + facet_grid(.~.sign)
> print(p)
> 

image

If you would like to split the output (numbers) as such, you can use the function split on the column named Cluster. The result will be a list in with separate results for all 3 groups in a slot. This list can then easily be exported to for example Excel, using the function saveWorkbook from the package openxlsx.

> out.list <- split(as.data.frame(xx), as.data.frame(xx)$Cluster)
> str(out.list)
List of 3
 $ GeneList1:'data.frame':      1281 obs. of  12 variables:
  ..$ Cluster        : Factor w/ 3 levels "GeneList1","GeneList2",..: 1 1 1 1 1 1 1 1 1 1 ...
  ..$ ID             : chr [1:1281] "GO:0007059" "GO:0051276" "GO:0098813" "GO:0000819" ...
  ..$ Description    : chr [1:1281] "chromosome segregation" "chromosome organization" "nuclear chromosome segregation" "sister chromatid segregation" ...
  ..$ setSize        : int [1:1281] 316 470 236 184 325 151 224 360 487 440 ...
  ..$ enrichmentScore: num [1:1281] 0.588 0.519 0.631 0.654 0.541 ...
  ..$ NES            : num [1:1281] 2.79 2.56 2.88 2.88 2.56 ...
  ..$ pvalue         : num [1:1281] 2.37e-31 1.04e-30 2.46e-29 1.87e-26 2.52e-24 ...
  ..$ p.adjust       : num [1:1281] 2.37e-31 1.04e-30 2.46e-29 1.87e-26 2.52e-24 ...
  ..$ qvalue         : num [1:1281] 8.26e-28 1.81e-27 2.85e-26 1.62e-23 1.75e-21 ...
  ..$ rank           : num [1:1281] 449 1374 449 532 1246 ...
  ..$ leading_edge   : chr [1:1281] "tags=20%, list=4%, signal=20%" "tags=24%, list=11%, signal=22%" "tags=22%, list=4%, signal=22%" "tags=25%, list=4%, signal=24%" ...
  ..$ core_enrichment: chr [1:1281] "CDCA8/CDC20/KIF23/CENPE/MYBL2/CCNB2/NDC80/TOP2A/NCAPH/ASPM/DLGAP5/UBE2C/HJURP/SKA1/NUSAP1/TPX2/TACC3/NEK2/CENPM"| __truncated__ "CDC45/CDCA8/CDC20/KIF23/CENPE/MYBL2/NDC80/TOP2A/NCAPH/DLGAP5/UBE2C/HJURP/NUSAP1/TPX2/TACC3/NEK2/CENPN/CDK1/MAD2"| __truncated__ "CDCA8/CDC20/KIF23/CENPE/MYBL2/CCNB2/NDC80/TOP2A/NCAPH/ASPM/DLGAP5/UBE2C/NUSAP1/TPX2/TACC3/NEK2/CDK1/MAD2L1/KIF1"| __truncated__ "CDCA8/CDC20/KIF23/CENPE/MYBL2/NDC80/TOP2A/NCAPH/DLGAP5/UBE2C/NUSAP1/TPX2/TACC3/NEK2/CDK1/MAD2L1/KIF18A/CDT1/BIR"| __truncated__ ...
 $ GeneList2:'data.frame':      1289 obs. of  12 variables:
  ..$ Cluster        : Factor w/ 3 levels "GeneList1","GeneList2",..: 2 2 2 2 2 2 2 2 2 2 ...
  ..$ ID             : chr [1:1289] "GO:0051276" "GO:0007059" "GO:0098813" "GO:0000819" ...
  ..$ Description    : chr [1:1289] "chromosome organization" "chromosome segregation" "nuclear chromosome segregation" "sister chromatid segregation" ...
  ..$ setSize        : int [1:1289] 470 316 236 184 325 151 224 360 487 440 ...
  ..$ enrichmentScore: num [1:1289] 0.519 0.588 0.631 0.654 0.541 ...
  ..$ NES            : num [1:1289] 2.55 2.77 2.89 2.89 2.56 ...
  ..$ pvalue         : num [1:1289] 9.43e-32 9.11e-31 4.40e-29 1.61e-26 1.12e-24 ...
  ..$ p.adjust       : num [1:1289] 9.43e-32 9.11e-31 4.40e-29 1.61e-26 1.12e-24 ...
  ..$ qvalue         : num [1:1289] 3.27e-28 1.58e-27 5.09e-26 1.39e-23 7.79e-22 ...
  ..$ rank           : num [1:1289] 1374 449 449 532 1246 ...
  ..$ leading_edge   : chr [1:1289] "tags=24%, list=11%, signal=22%" "tags=20%, list=4%, signal=20%" "tags=22%, list=4%, signal=22%" "tags=25%, list=4%, signal=24%" ...
  ..$ core_enrichment: chr [1:1289] "CDC45/CDCA8/CDC20/KIF23/CENPE/MYBL2/NDC80/TOP2A/NCAPH/DLGAP5/UBE2C/HJURP/NUSAP1/TPX2/TACC3/NEK2/CENPN/CDK1/MAD2"| __truncated__ "CDCA8/CDC20/KIF23/CENPE/MYBL2/CCNB2/NDC80/TOP2A/NCAPH/ASPM/DLGAP5/UBE2C/HJURP/SKA1/NUSAP1/TPX2/TACC3/NEK2/CENPM"| __truncated__ "CDCA8/CDC20/KIF23/CENPE/MYBL2/CCNB2/NDC80/TOP2A/NCAPH/ASPM/DLGAP5/UBE2C/NUSAP1/TPX2/TACC3/NEK2/CDK1/MAD2L1/KIF1"| __truncated__ "CDCA8/CDC20/KIF23/CENPE/MYBL2/NDC80/TOP2A/NCAPH/DLGAP5/UBE2C/NUSAP1/TPX2/TACC3/NEK2/CDK1/MAD2L1/KIF18A/CDT1/BIR"| __truncated__ ...
 $ GeneList3:'data.frame':      1318 obs. of  12 variables:
  ..$ Cluster        : Factor w/ 3 levels "GeneList1","GeneList2",..: 3 3 3 3 3 3 3 3 3 3 ...
  ..$ ID             : chr [1:1318] "GO:0051276" "GO:0007059" "GO:0098813" "GO:0000819" ...
  ..$ Description    : chr [1:1318] "chromosome organization" "chromosome segregation" "nuclear chromosome segregation" "sister chromatid segregation" ...
  ..$ setSize        : int [1:1318] 470 316 236 184 325 151 360 224 487 104 ...
  ..$ enrichmentScore: num [1:1318] -0.519 -0.588 -0.631 -0.654 -0.541 ...
  ..$ NES            : num [1:1318] -2.53 -2.79 -2.89 -2.9 -2.57 ...
  ..$ pvalue         : num [1:1318] 1.12e-30 1.97e-30 1.55e-29 6.89e-26 1.64e-24 ...
  ..$ p.adjust       : num [1:1318] 1.12e-30 1.97e-30 1.55e-29 6.89e-26 1.64e-24 ...
  ..$ qvalue         : num [1:1318] 3.40e-27 3.40e-27 1.78e-26 5.92e-23 9.75e-22 ...
  ..$ rank           : num [1:1318] 1375 450 450 533 1247 ...
  ..$ leading_edge   : chr [1:1318] "tags=24%, list=11%, signal=22%" "tags=27%, list=4%, signal=27%" "tags=22%, list=4%, signal=22%" "tags=25%, list=4%, signal=24%" ...
  ..$ core_enrichment: chr [1:1318] "MIS18A/NBN/UCHL5/SMC2/SPDL1/PRKCQ/NASP/RFC3/TUBG1/RMDN1/PSRC1/ERCC4/CHEK2/RAD21/BRIP1/NUP155/MCM3/H1-5/CCT2/SMC"| __truncated__ "MIS18A/SRPK1/SMC2/SPDL1/CIAO2B/TUBG1/RMDN1/PSRC1/TUBB/CHEK2/RAD21/BRIP1/PLSCR1/RCC1/SMC4/SLC25A5/NCAPD3/FIRRM/K"| __truncated__ "ZWILCH/FBXO5/CENPF/BUB1/NCAPD2/CCNE2/CCNE1/ESPL1/CENPI/ECT2/SPAG5/SPC25/ZWINT/BUB1B/PTTG3P/RACGAP1/PLK1/CDC6/KI"| __truncated__ "NCAPG2/ZWILCH/FBXO5/CENPF/BUB1/NCAPD2/ESPL1/CENPI/SPAG5/SPC25/ZWINT/BUB1B/RACGAP1/PLK1/CDC6/KIF2C/KIF14/KIF4A/C"| __truncated__ ...
> 
> 
> library(openxlsx)
> 
> wb <- createWorkbook()
> Map(function(data, nameofsheet){     
+     addWorksheet(wb, nameofsheet)
+     writeData(wb, nameofsheet, data)},
+     out.list, names(out.list) )
$GeneList1
[1] 0

$GeneList2
[1] 0

$GeneList3
[1] 0

> saveWorkbook(wb, file = "all.compareCluster.results.GOBP.xlsx", overwrite = TRUE)
> 
> 

all.compareCluster.results.GOBP.xlsx

image