MaayanLab / enrichr_issues

5 stars 3 forks source link

How is background gene list used and Adjusted.P.value computed? #99

Open xsun1229 opened 3 hours ago

xsun1229 commented 3 hours ago

Hi,

Could you provide more details on how the background gene list is handled? Does this mean the background gene list is adjustable? If I specify a custom background gene list, does the algorithm compare find the overlap among the gene list, background genes and gene set from the library and do fisher exact test using this overlap?

I also have a question regarding the computation of qvalues. You mentioned

The q-value is an adjusted p-value using the Benjamini-Hochberg method for correction for multiple hypotheses testing. You can read more about this method, and why it is needed here [2].

here https://maayanlab.cloud/Enrichr/help#background&q=4. Were these values adjusted by correcting Fisher’s exact test p-values across all gene sets in the library? Or did you filtered the FET p-values first and then correct them?

The reason I ask is that I used my own Fisher's exact test code to calculate enrichment p-values. Despite having the exact same overlapping genes and background totals, my raw p-values are quite similar to yours, yet my adjusted p-values are significantly higher. This leads me to wonder if any filtering was applied before adjusting the p-values. Could you clarify if there’s any pre-adjustment filtering step involved?

Thanks, Xiaotong

AviMaayan commented 2 hours ago

We are using https://github.com/statsmodels/statsmodels/blob/main/statsmodels/stats/multitest.py with all p-values and this function multipletests(p_vals, method="fdr_bh").

Could you provide more details on how the background gene list is handled? Does this mean the background gene list is adjustable? If I specify a custom background gene list, does the algorithm compare find the overlap among the gene list, background genes and gene set from the library and do fisher exact test using this overlap?

Yes.

xsun1229 commented 1 hour ago

Thank you for your prompt response! Could you clarify which files contain the Fisher's exact test implementation and which ones are responsible for calling the FDR correction functions?