ctlab / fgsea

Fast Gene Set Enrichment Analysis
Other
379 stars 67 forks source link

Issue with ranking #22

Closed ktroule closed 6 years ago

ktroule commented 6 years ago

Hi

I've started today with fgsea and I found something for which I've no explanation.

I’ve a ranking of 293 genes (it's just a test) from a DEG experiment, I’ve ordered genes by its p-value in two different ways:

A) sign of logFC * abs(log10(pvalue))
B) From 1 to 293, being 1 the most significant gene down-regulated, 293 the most significant up-regulated gene, 2 would be the second most significant gene down-regulated, 292 as you imagine is the second most significant gene up-regulated and so on for the rest of genes.

I took the top 20 up/down genes from this ranking as gene sets and performed a fgsea analysis.

fgseaRes <- fgsea(pathways = rnk.sig[1:2], stats = rnk.list[[1]], minSize=5, maxSize=500, nperm=10000, nproc = 15)

The ES were as expected whenever I used both rankings, for the UP gene set the ES was 1 and for the DN gene set the ES was -1.

The problem becomes when checking the p-values, if I use the ranking A, both gene sets (UP and DN) are significant with similar p-values. But this doesn’t happen when I input the ranking B, the UP gene set is significant while the DN does not. P-value seems to range between 1 and 0.3.

I’ve check for each analysis the plots by using the function plotEnrichment() and in all cases are ok, with genes totally skewed towards one of the extremes of the ranking

I’m not sure what is happening.

Thanks for your help.

assaron commented 6 years ago

Hi, Can you attach the pathways and rankings that you use (for exampe, in rda format), so that I can reproduce your problem?

ktroule commented 6 years ago

It seems (I've not yet identified what) that I was doing something wrong. I've created another dummy data and it seems to be working as expected.

Thanks

ktroule commented 6 years ago

I forgot to mention is that, what I do still see is differences in the NES, while the size of geneset and ES is the same. Attach 2 files containing the 2 gene sets and the rank.

gs <- readRDS("/home/Desktop/GeneSet.RDS")
 rank <- readRDS("/home/Desktop/Ranking.RDS")
 fgsea(pathways = gs, stats = rank, minSize=5, maxSize=500, nperm=10000, nproc = 15)

NES scores are 1.978 and -2.687.

fgseaFiles.zip

ktroule commented 6 years ago

Solved; Seen the parameter gseaParam