ctlab / fgsea

Fast Gene Set Enrichment Analysis
Other
379 stars 67 forks source link

fgseaMultilevel Error in result[[njob]] <- value : attempt to select less than one element in OneIndex #105

Closed Brawni closed 3 years ago

Brawni commented 3 years ago

Hi! thanks a lot for this great package!

I have been using fgsea to run in loop on different set of genes that I rank by -log10(pvalue) * sign (logFC). These lists of genes are of different sizes (as genes not expressed sufficiently in the groups or not passing a logFC are removed): 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 804 1360 2130 615 986 2796 900 951 923 2144 1766 1867 1427 1057 1787 1393

The code used is:

degCluster = degClusters[degClusters$cluster == i,]
        fgsea_ranks = -log10 (degCluster$p_val + 1e-300) + sign (degCluster$avg_log2FC)
        fgsea_ranks = setNames (fgsea_ranks, degCluster$gene)       
        fgsea_ranks = fgsea_ranks[fgsea_ranks != 0] # remove -log10(pval = 1)
        fgseaRes = fgseaMultilevel (pathways, 
                fgsea_ranks,
                minSize=15, 
                maxSize=1500, 
                #BPPARAM = SerialParam(),
                scoreType = 'std'
                )       

However I get the following error in this case when I run fgseaMultilevel for group '3', although this doesn't happen ALL the times:

Error in result[[njob]] <- value : 
  attempt to select less than one element in OneIndex
In addition: Warning messages:
1: In preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam,  :
  All values in the stats vector are greater than zero and scoreType is "std", maybe you should switch to scoreType = "pos".
2: In parallel::mccollect(wait = FALSE, timeout = 1) :
  1 parallel job did not deliver a result
> fgseaResCol = collapsePathways (fgseaRes, stats = fgsea_ranks, pathway = pathways)
Warning message:
In fgseaSimple(pathways = pathways[pathwaysUp], stats = stats[u2],  :
  There were 1 pathways for which P-values were not calculated properly due to unbalanced gene-level statistic values

Please find attached the offending list of genes fgsea_ranks_group3.txt

I think this is a similar problem to this post from biostars: https://www.biostars.org/p/9478106/

Thanks!

SessionInfo: sessionInfo.txt

assaron commented 3 years ago

Hi @Brawni

i see you have commented out line with SerialParam(), does the problem happens with bpparam set to SerialParam as well?

Brawni commented 3 years ago

with SerialParam() I get a segfault error, which seemed to be not an improvement, as follows:

 *** caught segfault ***
address 0x250, cause 'memory not mapped'

Traceback:
 1: fgseaMultilevelCpp(x[, ES], stats, unique(x[, size]), sampleSize,     seed, eps, sign)
 2: FUN(...)
 3: doTryCatch(return(expr), name, parentenv, handler)
 4: tryCatchOne(expr, names, parentenv, handlers[[1L]])
 5: tryCatchList(expr, classes, parentenv, handlers)
 6: tryCatch({    FUN(...)}, error = handle_error)
 7: withCallingHandlers({    tryCatch({        FUN(...)    }, error = handle_error)}, warning = handle_wa
rning)
 8: FUN(...)
 9: doTryCatch(return(expr), name, parentenv, handler)
10: tryCatchOne(expr, names, parentenv, handlers[[1L]])
11: tryCatchList(expr, classes, parentenv, handlers)
12: tryCatch(FUN(...), error = identity)
13: FUN(X[[i]], ...)
14: lapply(X, FUN_, ...)
15: bplapply(multilevelPathwaysList, function(x) {    eps <- eps * min(x$denomProb)    return(fgseaMultil
evelCpp(x[, ES], stats, unique(x[, size]),         sampleSize, seed, eps, sign))}, BPPARAM = BPPARAM)
16: bplapply(multilevelPathwaysList, function(x) {    eps <- eps * min(x$denomProb)    return(fgseaMultil
evelCpp(x[, ES], stats, unique(x[, size]),         sampleSize, seed, eps, sign))}, BPPARAM = BPPARAM)
17: multilevelImpl(multilevelPathwaysList, stats, sampleSize, seed,     eps, sign = sign, BPPARAM = BPPAR
AM)
18: fgseaMultilevel(pathways, fgsea_ranks, minSize = 15, maxSize = 1500,     BPPARAM = SerialParam(), sco
reType = "std")
19: eval(ei, envir)
20: eval(ei, envir)
21: withVisible(eval(ei, envir))
22: source("/ahg/regevdata/projects/ICA_Lung/Bruno/scripts/scrna_pipeline/DEG_standard.R")
assaron commented 3 years ago

Oh, that's really interesting. Can you attach the pathways you are using?

assaron commented 3 years ago

Also, in the list you've attached there are just 615 genes. Are these all the genes you consider?

assaron commented 3 years ago

Actually, please upgrade fgsea to the latest version (using devtools::install_github('ctlab/fgsea')). There was a bug we fixed that could cause a similar problem.

Brawni commented 3 years ago

converting the pathways file to txt is too big (>1GB) and I cant upload .rds on here. Do you know other means for me to upload it? Yes these genes are coming from a differential expression which filter out those not expressed in both groups in terms of percentage of cells expressing it (scRNA) and those with logFC lower than 0.25. Should I have a more inclusive list to start with? I will update to the last version and report back!

Thanks!!

Brawni commented 3 years ago

Hi! After updating to the last version I get a different error:

Error in `rownames<-`(`*tmp*`, value = sig_terms) : 
  attempt to set 'rownames' on an object with no dimensions
In addition: Warning messages:
1: In preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam,  :
  All values in the stats vector are greater than zero and scoreType is "std", maybe you should switch to scoreType = "pos".
2: In fgseaMultilevel(pathways, fgsea_ranks, minSize = 15, maxSize = 1500,  :
  There were 3 pathways for which P-values were not calculated properly due to unbalanced (positive and negative) gene-level statistic values. For such pathways pval, padj, NES, log2err are set to NA. You can try to increase the value of the argument nPermSimple (for example set it nPermSimple = 10000)
vdsukhov commented 3 years ago

Hi @Brawni ! I saw that your pathway file is quite large. But without a reproducible example, it is quite difficult for us to understand what is going wrong. I can suggest to choose a subset of pathways that lead to errors in the fgseaMultilevel function. To do this, you can try the following example

pathwaysToCheck <- list()
for (gsn in names(pathways)){
    tryCatch({
        set.seed(1)
        fr <- fgseaMultilevel(pathways[gsn],
                              fgsea_ranks,
                              minSize = 15,
                              maxSize = 1500)
    }, error = function(e){
        tmp <- get("pathwaysToCheck", env=globalenv())
        assign("pathwaysToCheck", 
               c(tmp, pathways[gsn]), 
               envir = globalenv())
    })
}

To determine only those pathways on which fgesaMultilevel crashes with an error. If after that the file is still too large, then you can select several pathways from pathwaysToCheck.

Brawni commented 3 years ago

HI @vdsukhov! I tried to reproduce the error several times but now it seems to work. I guess the last error I reported was due to some issues on my side and updating to the latest version fixed the initial problem, sorry not be more specific. If I encounter the error again I will run the code you posted and report back!

Thanks a lot!

assaron commented 3 years ago

Yes these genes are coming from a differential expression which filter out those not expressed in both groups in terms of percentage of cells expressing it (scRNA) and those with logFC lower than 0.25. Should I have a more inclusive list to start with?

@Brawni You should use all expressed genes for GSEA, don't filter the results by logFC.