YuLab-SMU / clusterProfiler

:bar_chart: A universal enrichment tool for interpreting omics data
https://yulab-smu.top/biomedical-knowledge-mining-book/
1.02k stars 256 forks source link

Fold enrichment #457

Open Guerande29 opened 2 years ago

Guerande29 commented 2 years ago

Hello! I am doing the analysis of Go terms for Arabidopsis. I have my data. I added genes Up and Dowm regulates separately. For the Go enrichment analysis I use the Araport11 background, And the function: compareCluster(geneClusters=DEG_Upgroups,enrichGO, OrgDb = org.At.tair.db, keyType = "TAIR", ont = "BP", universe = universe, pAdjustMethod = "BH", pvalueCutoff = 0.05, qvalueCutoff = 0.05)

Using the guide from the paper (https://www.sciencedirect.com/science/article/pii/S2666675821000667) I got the results and I added the richFactor, but I couldn't get the fold enrichment.

Is there any way of doing this?

Thank you very much,

huerqiang commented 2 years ago

Please provide repeatable data and code

Guerande29 commented 2 years ago

Please provide repeatable data and code

I finally tried this way, There is a result for "Fold enrichment" To verify; I did the manual calculation of "Fold enrichment" to several random GO terms and it seems to work. But it would be great to have a good process assurance

Script

Librerias

library(clusterProfiler) library(org.At.tair.db) library(forcats) library(enrichplot) library(pathview) library(data.table) library(ggplot2) library(GOsummaries) library(DOSE)

Universe: genes from Araport11 downloaded from https://www.arabidopsis.org

universe<- read.delim ("At11_universe.txt") universe<-as.character(universe[,1]) universe <- sort(universe, decreasing = TRUE)

My file with all DEGs - groups Dowm regulates

DEG_Dowmgrupos = read.delim("ComparisonDowm.txt", header = TRUE) gene<-as.character(DEG_Dowmgrupos[,1]) gene <- sort(gene, decreasing = TRUE)

Dowm_3DAS Dowm_6DAS Dowm

ORA in Clusterprofiler

only biological process

pAdjustMethod: BH: Benjamini-Hochberg multiple testing procedure. Performs the Benjamini-Hochberg FDR-controlling method for multiple hypothesis testing.

ORA_Dowmgrupos <- compareCluster(geneClusters=DEG_Dowmgrupos,enrichGO, OrgDb = org.At.tair.db, keyType = "TAIR", ont = "BP", universe = universe, pAdjustMethod = "BH", pvalueCutoff = 0.05, qvalueCutoff = 0.05)

Calculate and add the rich factor: rich factor is defined as the ratio of input genes (e.g., DEGs)

that are annotated in a term to all genes that are annotated in this term.

x <- mutate(ORA_Dowmgrupos, richFactor = Count / as.numeric(sub("/\d+", "", BgRatio)))

Save "x" in R

result<-as.data.table(x)

Calculate and add the fold enrichments: The fold enrichment is defined as the ratio of the frequency of

input genes annotated in a term to the frequency of all genes annotated to that term, and it is easy to calculate by dividing

geneRatio by BgRatio.

z <-mutate(x, FoldEnrichment = parse_ratio(GeneRatio) / parse_ratio(BgRatio))

result_1<-as.data.table(z)

Apply simplify

ORA_Dowmgrupos_1 <-simplify(z, cutoff = 0.5, by = "p.adjust", select_fun = min, measure = "Wang", semData = NULL)

Save in R

clust_results<-as.data.table(ORA_Dowmgrupos_1)

image