GSEA takes too long to execute

nfancy commented 3 years ago

Hi,

Thank you for this great package. I am trying to run GSEA with custom genesets in a .gmt file. My input gene list contains roughly ~18k genes. For each GSEA the program takes a really long time. I am running using 24 cores. My commands are as follows:

system.time(
 gsea_res <- WebGestaltR::WebGestaltR(enrichMethod = "GSEA",
                                                  enrichDatabaseFile = "./database.gmt",
                                                  enrichDatabaseType = "genesymbol",
                                                  interestGene = ranked_gene,
                                                  interestGeneType = "genesymbol",
                                                  minNum = 5,
                                                  maxNum = 500,
                                                  isOutput = FALSE,
                                                  nThreads = future::availableCores(),
                                                  projectName = database_name)
)

Loading the functional categories...
Loading the ID list...
Performing the enrichment analysis...
1000 permutations of score complete...
    user   system  elapsed 
1003.650   86.859 2555.592

data.zip

I am also attaching my ranked list and the .gmt file that I used. Any suggestion is appreciated.

Best wishes Nurun

yxngl commented 3 years ago

Hi Nurun,

I could not read your RDS file and it gives me an unknown input format error. But you seem to have a lot of input genes and GO BP is indeed slow due to its large size. I wonder the speed of the original Java implementation on your data, but I guess it would be similar. I think you could try fgsea for a large number of gene sets, which at least could be a first filtering step, as the significance test for each gene set is independent of others. We were considering incorporating fgsea, but it may not happen soon.

cycle20 commented 3 years ago

Hi @yxngl , Minor note: I also checked the attached data file. It is not saved by saveRDS. It is workspace data, you can use load function to read it.

bzhanglab / WebGestaltR

GSEA takes too long to execute #7