gexijin / shinygo

20 stars 11 forks source link

Enrichment FDR analysis, Fold Change #38

Open poojaparameswaran99 opened 10 months ago

poojaparameswaran99 commented 10 months ago

I am attempting to replicate the results from ShinyGo, but for a specific gene set pathway. I am interested in mmu05012, parkinsons disease, and I have 266 genes correlated with it. I want to check whether my set of Genes is enriched in this KEGG pathway. However, even after performing a hypergeometric test, I am not able to replicate the enrichment FDR value calculated by ShinyGO. Is there anyway you could please outline the statistical steps taken? I perform a geometric test acknowledging overlap, total_genes_in_my_set, genes_in_parkinsons, genes_in_Set_of_interest.

I also calculate probability, but this does not appear analogous to the Fold change score. I really appreciate any pointers. fold_enrichment = (n_overlap / n_module) / (n_parkinsons / n_total) probability = hypergeom.pmf(n_overlap, n_total, n_parkinsons, n_module) p_value = 1- hypergeom.cdf(n_overlap, n_total, n_parkinsons, n_module)

Perugolate commented 5 days ago

The ShinyGO paper states:

Enrichment analysis can also be conducted through API access to STRING (Szklarczyk et al., 2015)

but I'm having a hard time tracking down how STRING implements the testing

Perugolate commented 5 days ago

For users that query the STRING database with a set of proteins (as opposed to a single query protein only), the website computes a functional enrichment analysis in the background; this can then be inspected and browsed by the user, and includes interactive projections of the results onto the user's protein network. This functionality has been available since version 9.1, and is based on straightforward over-representation analysis using hypergeometric tests.

Szklarczyk D, Gable AL, Lyon D, Junge A, Wyder S, Huerta-Cepas J, Simonovic M, Doncheva NT, Morris JH, Bork P, Jensen LJ. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic acids research. 2019 Jan 8;47(D1):D607-13.