ftwkoopmans / goat

GOAT: efficient and robust identification of gene set enrichment
https://ftwkoopmans.github.io/goat/
Apache License 2.0
8 stars 0 forks source link

How to rank gene lists #5

Open Sirin24 opened 1 week ago

Sirin24 commented 1 week ago

Hello, thanks for the great tool.

I have ran findmarkers() seurat on fibroblasts diseased and control and have a list of with the corresponding pvalue padj and avg_log2fc . I want to use goat. I cannot seem to understand if it uses both pvalue and avg_log2fc or just one metric of the two. recently seurat has changed their formula to find DEGs and it seems that pvalue is important and cannot just use just avg_log2fc alone so i want to ask what does goat to the initial step of ranking the genes.

ftwkoopmans commented 1 week ago

You can choose whether to use p-values or log2fc values. Generally one finds more significant gene sets when ranking proteins/genes by their effectsize (/log2fc) as compared to p-values because p-values do not contain information on up/down-regulation and most pathways are co-regulated into either up- or down-regulation in practice (see further Figure 4 in the GOAT paper).

As described in the documentation (this GitHub repo's main page), you'll need to put the gene effectsizes (or in your case, log2fc) values in a column named "effectsize" and your their unadjusted p-values in a column named "pvalue". With these input data set, you can choose how to rank your genes using the "score_type" parameter for function test_genesets():

a. rank by p-value: test_genesets( ... , score_type = "pvalue") b. rank by effectsize: test_genesets( ... , score_type = "effectsize")