Closed zhangguy closed 1 year ago
Hi,
exampleRanks
is moderated t-statistic, not log2FC, but has the same direction.
ES means "A positive ES indicates gene set enrichment at the top of the ranked list; a negative ES indicates gene set enrichment at the bottom of the ranked list."
In GSEA the list is ranked in descending order. I think the more intuitive description of ES is not based on the ranking directly: high positive ES means enrichment in genes with positive gene-level statistics.
It seems that gsea does an internal ranking based on the values of the vector provided, and it doesn't matter if we pre-rank the gene list or not as long as the values are provided?
Yes, the order of the genes in the input vector does not matter.
If so, it might be more intuitive to provide exampleRanks as a shuffled vector and document the internal ranking behavior (ascending / descending) some where in the manual or vignette.
Makes sense. We'll think about it.
HI @assaron I have been looking at this from YuLab-SMU/DOSE# and wondering if there is any update on this bit
If so, it might be more intuitive to provide exampleRanks as a shuffled vector and document the internal ranking behavior (ascending / descending) some where in the manual or vignette.
If the order of the input genes does not matter, does this mean fgsea
is always performing GSEAPreranked?
EDIT: Got it, it's always preranked as the docs state.
Thank you, Nelson
Hi, Thanks so much for developing this package. I'm struggling to understand the direction of change for the significant pathways in the results. According to the Broad GSEA website, ES means "A positive ES indicates gene set enrichment at the top of the ranked list; a negative ES indicates gene set enrichment at the bottom of the ranked list." So with this in mind I run the example in the vignette
And assuming the values in exampleRanks are log2FC from RNA-Seq data, I would naively think "5990979_Cell_Cycle,_Mitotic" pathway is down-regulated: ES/NES >0 --> pathway genes enriched at the top/beginning of the ranked list + the list is ranked in ascending order and minus values (down-regulated genes) are at the beginning of the list. But as a matter of fact the leading edge genes are all up-regulated, which means "5990979_Cell_Cycle,_Mitotic" is perhaps up-regulated:
Now I'm really confused. Furthermore if we shuffle the order of exampleRanks, the gsea results are the same:
It seems that gsea does an internal ranking based on the values of the vector provided, and it doesn't matter if we pre-rank the gene list or not as long as the values are provided? If so, it might be more intuitive to provide exampleRanks as a shuffled vector and document the internal ranking behavior (ascending / descending) some where in the manual or vignette. Or I'm completely wrong here.
Thanks in advance for your answer