Explain how the statistics are done

MaayanLab / enrichr_issues

5 stars 3 forks source link

Explain how the statistics are done #3

Closed tim-peterson closed 3 years ago

tim-peterson commented 3 years ago

Thanks so much for producing Enrichr. It's an amazing tool.

Every tab provides an odds ratio and a p-value. It would be good in the FAQ to explain what type of statistics go into those calculations. It would make Enrichr less of a black box.

Thanks again!

EidrianGM commented 3 years ago

Thanks @tim-peterson I share the same concern. I was also wondering which is the background set used in the Fisher's Exact test.

malcook commented 3 years ago

FWIW answer to be found in https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-128#Sec2 (unless its changed since then) which begins:

Computing enrichment Enrichr implements three approaches to compute enrichment. The first one is a standard method implemented within most enrichment analysis tools: the Fisher exact test.

@EidrianGM - Presumably the background is the entire set against which enrichment is being calculated

EidrianGM commented 3 years ago

@EidrianGM - Presumably the background is the entire set against which enrichment is being calculated

Presumably yes indeed @malcook but in detail what is the entire set? All the genes from Uniprot? Ensembl? HGCN? NCBI? only protein coding? considering the whole genome or only those annotated in each database/gene set?

tim-peterson commented 3 years ago

I agree with @EidrianGM. It would be good to include information on the background gene sets for each of the tests. Better yet I’d love to get access to the gene sets so we could adjust them in our own way.

AviMaayan commented 3 years ago

We used 20K as a hard-coded value for the background for the Fisher Exact Test. All the libraries are available for download from here: https://maayanlab.cloud/Enrichr/#stats