YuLab-SMU / clusterProfiler

:bar_chart: A universal enrichment tool for interpreting omics data
https://yulab-smu.top/biomedical-knowledge-mining-book/
1.02k stars 256 forks source link

Fold enrichment #457

Open Guerande29 opened 2 years ago

Guerande29 commented 2 years ago

Hello! I am doing the analysis of Go terms for Arabidopsis. I have my data. I added genes Up and Dowm regulates separately. For the Go enrichment analysis I use the Araport11 background, And the function: compareCluster(geneClusters=DEG_Upgroups,enrichGO, OrgDb = org.At.tair.db, keyType = "TAIR", ont = "BP", universe = universe, pAdjustMethod = "BH", pvalueCutoff = 0.05, qvalueCutoff = 0.05)

Using the guide from the paper (https://www.sciencedirect.com/science/article/pii/S2666675821000667) I got the results and I added the richFactor, but I couldn't get the fold enrichment.

Is there any way of doing this?

Thank you very much,

huerqiang commented 2 years ago

Please provide repeatable data and code

Guerande29 commented 2 years ago

Please provide repeatable data and code

I finally tried this way, There is a result for "Fold enrichment" To verify; I did the manual calculation of "Fold enrichment" to several random GO terms and it seems to work. But it would be great to have a good process assurance

Script

Librerias

library(clusterProfiler) library(org.At.tair.db) library(forcats) library(enrichplot) library(pathview) library(data.table) library(ggplot2) library(GOsummaries) library(DOSE)

Universe: genes from Araport11 downloaded from https://www.arabidopsis.org

universe<- read.delim ("At11_universe.txt") universe<-as.character(universe[,1]) universe <- sort(universe, decreasing = TRUE)

My file with all DEGs - groups Dowm regulates

DEG_Dowmgrupos = read.delim("ComparisonDowm.txt", header = TRUE) gene<-as.character(DEG_Dowmgrupos[,1]) gene <- sort(gene, decreasing = TRUE)

Dowm_3DAS Dowm_6DAS Dowm_10DAS AT2G39510 AT1G52070 AT2G28270 AT1G19900 AT5G60770 AT1G13650 AT3G49580 AT4G31470 AT5G36140 AT2G48080 AT2G18800 AT1G71000 AT5G47450 AT2G33790 AT1G77520 AT5G46900 AT3G20210 AT5G24820 AT3G08860 AT5G46900 AT3G25820 AT5G04950 AT5G46890 AT3G13840 AT1G51470 AT1G27580 AT3G20210 AT4G17340 AT1G52060 AT4G29340 AT3G16440 AT4G22666 AT2G28490 AT4G15290 AT1G64590 AT5G60770 AT2G02990 AT4G26320 AT4G13800 AT2G01520 AT3G01260 AT4G11320 AT5G15180 AT3G21380 AT1G08100 AT1G70850 AT4G15290 AT1G61270 AT2G15620 AT5G36150 AT5G51720 AT1G55430 AT5G55410 AT1G64590 AT3G21340 AT2G15370 AT2G01520 AT4G26220 AT5G56540 AT2G35300 AT2G28780 AT5G53250 AT5G03545 AT5G10230 AT1G06120 AT5G36150 AT4G13890 AT1G06090 AT4G10540 AT1G14960 AT5G19560 AT1G77530

ORA in Clusterprofiler

only biological process

pAdjustMethod: BH: Benjamini-Hochberg multiple testing procedure. Performs the Benjamini-Hochberg FDR-controlling method for multiple hypothesis testing.

ORA_Dowmgrupos <- compareCluster(geneClusters=DEG_Dowmgrupos,enrichGO, OrgDb = org.At.tair.db, keyType = "TAIR", ont = "BP", universe = universe, pAdjustMethod = "BH", pvalueCutoff = 0.05, qvalueCutoff = 0.05)

Calculate and add the rich factor: rich factor is defined as the ratio of input genes (e.g., DEGs)

that are annotated in a term to all genes that are annotated in this term.

x <- mutate(ORA_Dowmgrupos, richFactor = Count / as.numeric(sub("/\d+", "", BgRatio)))

Save "x" in R

result<-as.data.table(x)

Calculate and add the fold enrichments: The fold enrichment is defined as the ratio of the frequency of

input genes annotated in a term to the frequency of all genes annotated to that term, and it is easy to calculate by dividing

geneRatio by BgRatio.

z <-mutate(x, FoldEnrichment = parse_ratio(GeneRatio) / parse_ratio(BgRatio))

result_1<-as.data.table(z)

Apply simplify

ORA_Dowmgrupos_1 <-simplify(z, cutoff = 0.5, by = "p.adjust", select_fun = min, measure = "Wang", semData = NULL)

Save in R

clust_results<-as.data.table(ORA_Dowmgrupos_1)

image