YuLab-SMU / clusterProfiler

:bar_chart: A universal enrichment tool for interpreting omics data
https://yulab-smu.top/biomedical-knowledge-mining-book/
998 stars 252 forks source link

compareCluster GSEA + universal annotation = does not work #705

Open philousap opened 1 month ago

philousap commented 1 month ago

Hi, I am working with compareCluster, trying to do a GSEA analysis using homemade annotation. To do so I decide to use Universal enrichment analysis.

I have a TERM2GENE and TERM2NAME. I am trying to run the same a GSEA (with all my genes - not only differential expressed ones). I have an error: Error in check_gene_id(geneList, geneSets) : --> No gene can be mapped....

Here my code (let's say input1, input2 and input3 are subsets of input):

geneList1=input1$logFC names(geneList1)=input1$Symbol geneList1=sort(geneList1, decreasing = TRUE)

geneList2=input2$logFC names(geneList2)=input2$Symbol geneList2=sort(geneList2, decreasing = TRUE)

geneList3=input3$logFC names(geneList3)=input3$Symbol geneList3=sort(geneList3, decreasing = TRUE)

geneList=c("Gut"=geneList1, "Head"=geneList2, "Abdomen"=geneList3)

gsea=compareCluster(geneList, data=input, TERM2GENE=TERM2GENE, TERM2NAME =TERM2NAME, fun="GSEA" minGSSize = 10, nPermSimple = 100000, eps=0, pvalueCutoff = 0.1, pAdjustMethod = "BH" )

This is my error: Error in check_gene_id(geneList, geneSets) : --> No gene can be mapped....

However, when I run the same geneLists one by one (geneList1, then geneList2 and so one) using the following code, it works perfectly (see my pictures at the end)

GSEA(geneList1, TERM2GENE=TERM2GENE, TERM2NAME =TERM2NAME, minGSSize = 10, nPermSimple = 100000, eps=0, pvalueCutoff = 0.1, pAdjustMethod = "BH" )

Some one could help me here please? Thank you Gut Abdomen Head

guidohooiveld commented 1 month ago

Everything points to something not being OK with your aggregated input geneList. Please double-check it!

Please note that performing GSEA with user-defined gene sets through the compareCluster function works as expected. See my code below. Also note that nPermSimple is a legacy argument, and should not be used anymore! Only use the argument eps.

> ## load libraries
> library(clusterProfiler)
> library(enrichplot)
> library(org.Hs.eg.db)
> 
> ## load sample data
> data(geneList, package="DOSE")
> head(geneList)
    4312     8318    10874    55143    55388      991 
4.572613 4.514594 4.418218 4.144075 3.876258 3.677857 
> 
> ## using sample data, create list with 3 comparisons to be used as input for comparCluster
> ## note that 'Head' is the reverse of 'Gut' and 'Abdomen'.
> inputList <- list(Gut = geneList, Abdomen = geneList, Head = sort(-1*geneList, decreasing = TRUE) )
> 
> str(inputList)
List of 3
 $ Gut    : Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ...
  ..- attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ...
 $ Abdomen: Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ...
  ..- attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ...
 $ Head   : Named num [1:12495] 4.3 3.95 3.6 3.46 3.42 ...
  ..- attr(*, "names")= chr [1:12495] "4969" "57758" "79901" "79838" ...
> 
> 
> ## create TERM2GENE and TERM2NAME for KEGG
> ## note that when using the fuction gseKEGG this step is done automagically
> kegg.data <- download_KEGG(species="hsa", keggType = "KEGG", keyType = "kegg")
> term2gene.kegg <- kegg.data$KEGGPATHID2EXTID
> term2name.kegg <- kegg.data$KEGGPATHID2NAME
> 
> head(term2gene.kegg)
      from    to
1 hsa00010 10327
2 hsa00010   124
3 hsa00010   125
4 hsa00010   126
5 hsa00010   127
6 hsa00010   128
> head(term2name.kegg)
      from                              to
1 hsa01100              Metabolic pathways
2 hsa01200               Carbon metabolism
3 hsa01210 2-Oxocarboxylic acid metabolism
4 hsa01212           Fatty acid metabolism
5 hsa01230     Biosynthesis of amino acids
6 hsa01232           Nucleotide metabolism
> 
> 
> ## run compareCluster with generic function GSEA, and user-provied TERM2GENE and TERM2NAME
> res <- compareCluster(geneCluster = inputList,
+                       fun = "GSEA",
+                       eps = 0,
+                       pvalueCutoff = 0.05,
+                       pAdjustMethod = "BH",
+                       TERM2GENE = term2gene.kegg,
+                       TERM2NAME = term2name.kegg)
> 
> ## convert entrezids into symbols
> res <- setReadable(res, OrgDb = org.Hs.eg.db, keyType="ENTREZID")
>  
> ## check
> res
#
# Result of Comparing 3 gene clusters 
#
#.. @fun         GSEA 
#.. @geneClusters       List of 3
 $ Gut    : Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ...
  ..- attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ...
 $ Abdomen: Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ...
  ..- attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ...
 $ Head   : Named num [1:12495] 4.3 3.95 3.6 3.46 3.42 ...
  ..- attr(*, "names")= chr [1:12495] "4969" "57758" "79901" "79838" ...
#...Result      'data.frame':   172 obs. of  12 variables:
 $ Cluster        : Factor w/ 3 levels "Gut","Abdomen",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ ID             : chr  "hsa04110" "hsa03050" "hsa04657" "hsa05169" ...
 $ Description    : chr  "Cell cycle" "Proteasome" "IL-17 signaling pathway" "Epstein-Barr virus infection" ...
 $ setSize        : int  139 43 85 193 33 62 86 130 55 80 ...
 $ enrichmentScore: num  0.664 0.709 0.562 0.434 0.723 ...
 $ NES            : num  2.83 2.41 2.2 1.96 2.31 ...
 $ pvalue         : num  9.16e-21 2.97e-08 8.38e-08 1.52e-07 4.07e-07 ...
 $ p.adjust       : num  3.13e-18 5.08e-06 9.56e-06 1.30e-05 2.78e-05 ...
 $ qvalue         : num  2.30e-18 3.72e-06 7.00e-06 9.51e-06 2.04e-05 ...
 $ rank           : num  1155 2516 2880 2820 1905 ...
 $ leading_edge   : chr  "tags=36%, list=9%, signal=33%" "tags=65%, list=20%, signal=52%" "tags=49%, list=23%, signal=38%" "tags=39%, list=23%, signal=31%" ...
 $ core_enrichment: chr  "CDC45/CDC20/CCNB2/NDC80/CCNA2/CDK1/MAD2L1/CDT1/TTK/AURKB/CHEK1/TRIP13/CCNB1/MCM5/PTTG1/MCM2/CDC25A/CDC6/PLK1/BU"| __truncated__ "PSMA7/PSMD3/PSMB9/PSMB5/IFNG/PSMD7/ADRM1/PSME2/PSMB3/PSMA4/PSMB2/PSMA3/PSMA5/PSMB7/PSMD14/PSME4/SEM1/PSMB10/PSM"| __truncated__ "MMP1/S100A9/S100A8/S100A7/CXCL10/CXCL3/CCL20/FOSL1/MMP9/CXCL8/LCN2/CCL2/MUC5B/CEBPB/CCL7/IFNG/CCL17/CXCL5/CXCL1"| __truncated__ "CXCL10/CCNA2/TAP1/ISG15/CCNE1/CCNE2/SKP2/STAT1/HLA-DRB4/HLA-DOB/MYC/CD3G/PSMD3/E2F1/IRAK1/CD247/CD3D/LYN/OAS1/R"| __truncated__ ...
#.. number of enriched terms found for each gene cluster:
#..   Gut: 56 
#..   Abdomen: 58 
#..   Head: 58 
#
#...Citation
T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, 
W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu. 
clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. 
The Innovation. 2021, 2(3):100141 

> 
> ## prepare dotplot
> p <- dotplot(res, font.size=8, showCategory=8, title =("GSEA results"), split=".sign") + facet_grid(.~.sign)
> print(p)
> 
> 

image