How to split the results of compareCluster? #677

Open Ldec12 opened 6 months ago

Ldec12 commented 6 months ago

The geneList contains three groups. After running the following code GSEA_GO <- compareCluster(geneList, fun="GSEA",TERM2GENE=m_df, eps = 0, pvalueCutoff=0.2), how can I split GSEA_GO into three groups according to the original grouping?” Thanks

guidohooiveld commented 6 months ago

It is not clear to me what exactly you try to achieve!

If you would like to create a dotplot grouped according to the 3 input lists, you should use the argumentsplit=".sign" together with calling the function facet_grid as well; thus:

dotplot(GSEA_GO, showCategory=10, split=".sign") + facet_grid(.~.sign)

I agree that this is not well documented!

> library(clusterProfiler)
> library(enrichplot)
> library(org.Hs.eg.db)
> data(geneList, package="DOSE")
> inputList <- list(GeneList1 = geneList,
+                   GeneList2 = geneList,
+                   GeneList3 = rev(-1*geneList) )  # reverse order
> ## compareCluster-GSEA
> xx <- compareCluster(geneClusters=inputList, fun = "gseGO",
+              OrgDb = org.Hs.eg.db, keyType = "ENTREZID",
+              ont = "BP", eps = 0, pvalueCutoff = 0.05,
+              pAdjustMethod = "none", minGSSize = 15, maxGSSize = 500)
> xx <- enrichplot::pairwise_termsim(xx) 
> xx <- setReadable(xx, 'org.Hs.eg.db', 'ENTREZID')
> p <- dotplot(xx, font.size=8, showCategory=8, title =("GSEA results"), split=".sign") + facet_grid(.~.sign)
> print(p)


If you would like to split the output (numbers) as such, you can use the function split on the column named Cluster. The result will be a list in with separate results for all 3 groups in a slot. This list can then easily be exported to for example Excel, using the function saveWorkbook from the package openxlsx.

> out.list <- split(as.data.frame(xx), as.data.frame(xx)$Cluster)
> str(out.list)
List of 3
 $ GeneList1:'data.frame':      1281 obs. of  12 variables:
  ..$ Cluster        : Factor w/ 3 levels "GeneList1","GeneList2",..: 1 1 1 1 1 1 1 1 1 1 ...
  ..$ ID             : chr [1:1281] "GO:0007059" "GO:0051276" "GO:0098813" "GO:0000819" ...
  ..$ Description    : chr [1:1281] "chromosome segregation" "chromosome organization" "nuclear chromosome segregation" "sister chromatid segregation" ...
  ..$ setSize        : int [1:1281] 316 470 236 184 325 151 224 360 487 440 ...
  ..$ enrichmentScore: num [1:1281] 0.588 0.519 0.631 0.654 0.541 ...
  ..$ NES            : num [1:1281] 2.79 2.56 2.88 2.88 2.56 ...
  ..$ pvalue         : num [1:1281] 2.37e-31 1.04e-30 2.46e-29 1.87e-26 2.52e-24 ...
  ..$ p.adjust       : num [1:1281] 2.37e-31 1.04e-30 2.46e-29 1.87e-26 2.52e-24 ...
  ..$ qvalue         : num [1:1281] 8.26e-28 1.81e-27 2.85e-26 1.62e-23 1.75e-21 ...
  ..$ rank           : num [1:1281] 449 1374 449 532 1246 ...
  ..$ leading_edge   : chr [1:1281] "tags=20%, list=4%, signal=20%" "tags=24%, list=11%, signal=22%" "tags=22%, list=4%, signal=22%" "tags=25%, list=4%, signal=24%" ...
 $ GeneList2:'data.frame':      1289 obs. of  12 variables:
  ..$ Cluster        : Factor w/ 3 levels "GeneList1","GeneList2",..: 2 2 2 2 2 2 2 2 2 2 ...
  ..$ ID             : chr [1:1289] "GO:0051276" "GO:0007059" "GO:0098813" "GO:0000819" ...
  ..$ Description    : chr [1:1289] "chromosome organization" "chromosome segregation" "nuclear chromosome segregation" "sister chromatid segregation" ...
  ..$ setSize        : int [1:1289] 470 316 236 184 325 151 224 360 487 440 ...
  ..$ enrichmentScore: num [1:1289] 0.519 0.588 0.631 0.654 0.541 ...
  ..$ NES            : num [1:1289] 2.55 2.77 2.89 2.89 2.56 ...
  ..$ pvalue         : num [1:1289] 9.43e-32 9.11e-31 4.40e-29 1.61e-26 1.12e-24 ...
  ..$ p.adjust       : num [1:1289] 9.43e-32 9.11e-31 4.40e-29 1.61e-26 1.12e-24 ...
  ..$ qvalue         : num [1:1289] 3.27e-28 1.58e-27 5.09e-26 1.39e-23 7.79e-22 ...
  ..$ rank           : num [1:1289] 1374 449 449 532 1246 ...
  ..$ leading_edge   : chr [1:1289] "tags=24%, list=11%, signal=22%" "tags=20%, list=4%, signal=20%" "tags=22%, list=4%, signal=22%" "tags=25%, list=4%, signal=24%" ...
 $ GeneList3:'data.frame':      1318 obs. of  12 variables:
  ..$ Cluster        : Factor w/ 3 levels "GeneList1","GeneList2",..: 3 3 3 3 3 3 3 3 3 3 ...
  ..$ ID             : chr [1:1318] "GO:0051276" "GO:0007059" "GO:0098813" "GO:0000819" ...
  ..$ Description    : chr [1:1318] "chromosome organization" "chromosome segregation" "nuclear chromosome segregation" "sister chromatid segregation" ...
  ..$ setSize        : int [1:1318] 470 316 236 184 325 151 360 224 487 104 ...
  ..$ enrichmentScore: num [1:1318] -0.519 -0.588 -0.631 -0.654 -0.541 ...
  ..$ NES            : num [1:1318] -2.53 -2.79 -2.89 -2.9 -2.57 ...
  ..$ pvalue         : num [1:1318] 1.12e-30 1.97e-30 1.55e-29 6.89e-26 1.64e-24 ...
  ..$ p.adjust       : num [1:1318] 1.12e-30 1.97e-30 1.55e-29 6.89e-26 1.64e-24 ...
  ..$ qvalue         : num [1:1318] 3.40e-27 3.40e-27 1.78e-26 5.92e-23 9.75e-22 ...
  ..$ rank           : num [1:1318] 1375 450 450 533 1247 ...
  ..$ leading_edge   : chr [1:1318] "tags=24%, list=11%, signal=22%" "tags=27%, list=4%, signal=27%" "tags=22%, list=4%, signal=22%" "tags=25%, list=4%, signal=24%" ...
> library(openxlsx)
> wb <- createWorkbook()
> Map(function(data, nameofsheet){     
+     addWorksheet(wb, nameofsheet)
+     writeData(wb, nameofsheet, data)},
+     out.list, names(out.list) )
[1] 0

[1] 0

[1] 0

> saveWorkbook(wb, file = "all.compareCluster.results.GOBP.xlsx", overwrite = TRUE)

