ctlab / fgsea

Fast Gene Set Enrichment Analysis
Other
379 stars 67 forks source link

problem of fgseaMultilevel #112

Closed FionaMoon closed 2 years ago

FionaMoon commented 2 years ago

Hi, I met a wired problem. I have a list which contains 10 DEG(different expression genes) results of 10 different samples . However, every time I run fgseaMultilevel , I will stuck on the 7th sample without any warning or result. I 've checked the 7th sample, but find nothing special.

$ NK :'data.frame': 1141 obs. of 7 variables: ..$ p_val : num [1:1141] 3.13e-191 7.95e-269 3.41e-237 6.09e-140 4.85e-206 ... ..$ log2FC : num [1:1141] 5.32 4.83 4.05 4.04 4 ... ..$ pct.1 : num [1:1141] 0.961 0.961 0.877 1 0.948 0.71 0.729 0.948 0.929 0.548 ... ..$ pct.2 : num [1:1141] 0.131 0.068 0.062 0.255 0.107 0.077 0.039 0.149 0.152 0.058 ... ..$ adjusted.pval: num [1:1141] 4.30e-187 1.09e-264 4.67e-233 8.35e-136 6.65e-202 ... ..$ cluster : Factor w/ 9 levels "Naive CD4 T",..: 7 7 7 7 7 7 7 7 7 7 ... ..$ Gene : chr [1:1141] "GNLY" "GZMB" "FGFBP2" "NKG7" ...

The parameters I use as following:

fgseaRes <- fgseaMultilevel(pathways = my.db, 
                   stats = prerank.genes,
                   minSize=5,
                   maxSize=2500 ,
                   eps = 0)

I wonder what's result in that?

Thank you in advance. Fiona

FionaMoon commented 2 years ago

When I use default parameter of eps , I got result and warning.

fgseaRes <- fgseaMultilevel(pathways = my.db, 
                   stats = prerank.genes,
                   minSize=5,
                   maxSize=2500
                )

Warning messages: 1: In fgseaMultilevel(pathways = my.db, stats = prerank.genes, minSize = 5, : For some of the pathways the P-values were likely overestimated. For such pathways log2err is set to NA. 2: In fgseaMultilevel(pathways = my.db, stats = prerank.genes, minSize = 5, : For some pathways, in reality P-values are less than 1e-10. You can set the eps argument to zero for better estimation.

So, how to choose proper eps ?

assaron commented 2 years ago

@FionaMoon what kind of data you use as an input? It looks like to few genes: fgsea should be run on all the expressed genes, not just differentially expressed ones.

Additionally, what version you are using? There were a number of fixes that made the behavior more stable on such unexpected inputs, so you can try to install the version from github

FionaMoon commented 2 years ago

Thank you for your answer. My data is scRNA-seq which contains fewer genes than bulk RNA-seq. Here's my data

library(scater)
library(Seurat)
# install SeuratDisk from GitHub using the remotes package remotes::install_github(repo =
# 'mojaveazure/seurat-disk', ref = 'develop')
library(SeuratDisk)
library(SeuratData)
library(patchwork)
# download and save PBMC3K from SeuratData
InstallData("pbmc3k")
pbmc <- LoadData(ds = "pbmc3k", type = "pbmc3k.final")
FionaMoon commented 2 years ago

I've solved this problem by use another algorithm to calculate DEGs. Thank you.