YuLab-SMU / clusterProfiler

:bar_chart: A universal enrichment tool for interpreting omics data
https://yulab-smu.top/biomedical-knowledge-mining-book/
1.01k stars 253 forks source link

GO enrichment at WGCNA #578

Open Guerande29 opened 1 year ago

Guerande29 commented 1 year ago

Prerequisites

Dear YuLab-SMU

I am trying to apply the GO enrichment to the results obtained in WGCNA. I was not able to get the data idirectly into R, so I did the manual list on excel of the genes (as I have done before). I got this error message "Warning message: In compareCluster(geneClusters = DEGs, enrichGO, OrgDb = org.At.tair.db, : No enrichment found in any of gene cluster, please check your input..." I suspect that maybe, it is due to the format of the input. For the WGCNA I used the TPM and I have the geneIDs with the respective splicing.

i.e paleturquoise brown yellow royalblue AT1G14200.1 AT3G10760.1 AT2G21580.1 AT4G35140.1 AT5G16150.2 AT2G32040.2 AT1G28395.6 AT3G03230.1 AT5G43670.1 AT5G21222.4 AT1G41880.1 AT5G44000.1 AT2G39705.1 AT3G10840.4 AT5G24690.1 AT4G36850.1 AT5G04740.1 AT4G35440.1 AT5G60670.1 AT1G50400.1 AT1G74680.1 AT1G53050.2 AT5G02450.1 AT1G14760.2

Do you have any advice or recommendation in this case?

Here, related information

MacBook Pro Processor: 2 GHz Quad-Core Intel Core i5 Graphics: Intel Iris Plus Graphics 1536 MB Memory: 32 GB 3733 MHz LPDDR4X macOS Ventura: Version 13.0 (22A380)

R version 4.2.3 (2023-03-15) -- "Shortstop Beagle" Clusterprofiler v 4.7.1.003

library(AnnotationDbi) library(clusterProfiler)
library(org.At.tair.db)
library(forcats)
library(enrichplot)
library(pathview)
library(data.table)
library(ggplot2)
library(GOsummaries)
library(DOSE)

loading universe

universe <- read.csv(universe.csv) universe<-as.character(universe) universe <- sort(universe, decreasing = TRUE)

loading DEGs

DEGs <- read.delim("DEGs.txt", header = TRUE)

Run ORA with Clusterprofiler

ORA <- compareCluster(geneClusters=DEGs,enrichGO, OrgDb = org.At.tair.db, keyType = "TAIR", ont = "BP", pAdjustMethod = "BH", universe = universe, pvalueCutoff = 0.05, qvalueCutoff = 0.05, readable = FALSE) DEGs.txt universe.csv

Thanks in advance, Sincerely

huerqiang commented 1 year ago

Your input needs some modification:

DEGs <- read.delim("DEGs.txt", header = TRUE)
deg_list <- vector("list", ncol(DEGs))
names(deg_list) <- colnames(DEGs)
for (i in 1:ncol(DEGs)) {
    gene <- unique(DEGs[, i])
    gene <- gsub("\\..*", "", gene)
    deg_list[[i]] <- gene
}
universe <- read.csv("universe.csv")[, 2]
universe <- gsub("\\..*", "", universe)
ORA <- compareCluster(geneClusters=deg_list,
    enrichGO,
    OrgDb = org.At.tair.db,
    keyType = "TAIR",
    ont = "BP",
    pAdjustMethod = "BH",
    universe = universe,
    pvalueCutoff = 0.05,
    qvalueCutoff = 0.05,
    readable = FALSE)

The result:

> ORA
#
# Result of Comparing 4 gene clusters 
#
#.. @fun         enrichGO 
#.. @geneClusters       List of 4
 $ paleturquoise: chr [1:276] "AT1G15520" "AT1G55530" "AT5G03700" "AT3G06570" ...
 $ brown        : chr [1:257] "AT1G54500" "AT3G28455" "AT4G27240" "AT1G70950" ...
 $ yellow       : chr [1:427] "AT2G37600" "AT4G20440" "AT5G36950" "AT3G03450" ...
 $ royalblue    : chr [1:120] "AT5G38530" "AT1G02850" "AT5G58050" "AT4G18150" ...
#...Result      'data.frame':   62 obs. of  10 variables:
 $ Cluster    : Factor w/ 4 levels "paleturquoise",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ ID         : chr  "GO:0009620" "GO:0050832" "GO:0009642" "GO:0009415" ...
 $ Description: chr  "response to fungus" "defense response to fungus" "response to light intensity" "response to water" ...
 $ GeneRatio  : chr  "26/265" "21/265" "16/265" "27/265" ...
 $ BgRatio    : chr  "242/6048" "174/6048" "125/6048" "289/6048" ...
 $ pvalue     : num  1.71e-05 2.01e-05 9.87e-05 1.38e-04 1.82e-04 ...
 $ p.adjust   : num  0.00752 0.00752 0.02463 0.0253 0.0253 ...
 $ qvalue     : num  0.00747 0.00747 0.02447 0.02513 0.02513 ...
 $ geneID     : chr  "AT5G03700/AT5G26600/AT2G05380/AT5G52120/AT5G32450/AT1G02230/AT3G47090/AT5G61600/AT3G04210/AT5G19250/AT4G03450/A"| __truncated__ "AT5G26600/AT2G05380/AT5G32450/AT1G02230/AT3G47090/AT5G61600/AT3G04210/AT5G19250/AT4G03450/AT4G19660/AT3G29575/A"| __truncated__ "AT2G39705/AT2G25080/AT5G05965/AT3G13750/AT3G48390/AT2G27820/AT5G16120/AT1G58180/AT2G18010/AT4G18290/AT1G75460/A"| __truncated__ "AT1G15520/AT2G29670/AT1G78210/AT3G51990/AT3G54200/AT2G42620/AT1G75900/AT3G29575/AT4G30960/AT1G01620/AT4G23400/A"| __truncated__ ...
 $ Count      : int  26 21 16 27 21 27 23 26 14 24 ...
#.. number of enriched terms found for each gene cluster:
#..   paleturquoise: 11 
#..   brown: 6 
#..   yellow: 45 
#..   royalblue: 0 
#
#...Citation
T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, 
W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu. 
clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. 
The Innovation. 2021, 2(3):100141 
Guerande29 commented 1 year ago

Thank you very much, it's worked very well.

Just to understand better. You did a vector with the data, then you did a FOR where you joined all the isoforms corresponding to the same gene and thus you was obtained the genID to matches properly with the GO databases?

Thank you very much Mr.