bzhanglab / WebGestaltR

R package for WebGestalt
https://bzhanglab.github.io/WebGestaltR/
34 stars 13 forks source link

GSEA error, n_all_rand_min missing #13

Open cmatKhan opened 2 years ago

cmatKhan commented 2 years ago

I am using the function WebGestaltRBatch. Code is provided below. But, the error I am receiving is this:

Process file: /home/oguzkhan/projects/eds1/eds1_mimic_cc/data/webgestalt_input/minus_lys/LYS14.rnk
Loading the functional categories...
Loading the ID list... 

Summarizing the uploaded ID list by GO Slim data... 

Performing the enrichment analysis...  
1000 permutations of score complete...   
Error in if (n_all_rand_min == 0) { :
  missing value where TRUE/FALSE needed

Which seems to imply that n_all_rand_min is not set. My other results tables do work, for what it is worth.

library(tidyverse)
library(DESeq2)
library(WebGestaltR)
library(here)

PADJ_THRES = .05
LFC_THRES = 1
MIN_NUM_GENES_IN_GO_CATEGORY = 5
MAX_NUM_GENES_IN_GO_CATEGORY = 500

PROJECT_NAME = "mimic_cc"
ENRICH_DB = listGeneSet(organism = "scerevisiae")[c(2,4,6,7,8,9,10,11,12), 'name']

# create batch input ----------

res_list = readRDS(here("data/shrunken_res_lists.rds"))

webgestalt_input = here("data/webgestalt_input")

dir.create(here(webgestalt_input))

for(res_list_name in names(res_list)){

  tp_res = res_list[[res_list_name]]

  for(geno in names(tp_res)){

    geno_at_tp_res = tp_res[[geno]] %>%
      as_tibble(rownames = 'gene') %>%
      filter(padj < PADJ_THRES, abs(log2FoldChange) > LFC_THRES) %>%
      select(gene, log2FoldChange)

    filename = paste0(geno, ".rnk")

    tp_dir = file.path(webgestalt_input, res_list_name)
    dir.create(tp_dir)

    outpath = file.path(tp_dir, filename)

    write_tsv(geno_at_tp_res, outpath, col_names = FALSE)

  }
}

# create batch output ----------

for(res_list_name in names(res_list)){

  message(res_list_name)

  interest_gene_folder =  file.path(webgestalt_input, res_list_name)

  output_dir = file.path(here("data/webgestalt_out"), res_list_name)
  dir.create(output_dir, recursive = TRUE)

  WebGestaltRBatch(
    interestGeneFolder = interest_gene_folder,
    enrichMethod = "GSEA",
    organism = "scerevisiae",
    interestGeneType="ensembl_gene_id",
    # note: set this way for testing. couldn't figure out which table kicked up the error when running in parallel
    isParallel = FALSE,
    # nThreads = 10,
    enrichDatabase = ENRICH_DB,
    collapseMethod = "mean",
    minNum = MIN_NUM_GENES_IN_GO_CATEGORY,
    maxNum = MAX_NUM_GENES_IN_GO_CATEGORY,
    sigMethod = "fdr",
    fdrMethod = "BH",
    fdrThr = 0.05,
    topThr = 10,
    reportNum = 20,
    perNum = 1000,
    gseaP = 1,
    isOutput = TRUE,
    outputDirectory = output_dir,
    projectName = PROJECT_NAME,
    dagColor = "continuous",
    saveRawGseaResult = FALSE,
    gseaPlotFormat = 'png',
    setCoverNum = 10
  )
}

The rnk list in question looks like this:

YDR034C -9.617973436482961
YDR276C -1.0796356705189585
YEL017C-A   -1.015907877435956
YEL069C -1.134240186420286
YIL094C -1.3918892176597528
YLL057C -1.2047930491540484
YMR251W-A   -1.0800316005655233
YNCG0010W   -1.0260881336089769
YNR050C -1.2539886195196162
YOL052C-A   -1.055971979770595

I suspect that what is happening is that I simply shouldn't pass a list so short to webgestalt -- I didn't realize how short it was before the error. If that is the case, some error handling would be helpful -- maybe notify the user that a given table is too small to be considered and skip it.

edit: also happening with a longer list, so maybe not.

> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] here_1.0.1                  WebGestaltR_0.4.4           DESeq2_1.34.0               SummarizedExperiment_1.24.0 Biobase_2.54.0             
 [6] MatrixGenerics_1.6.0        matrixStats_0.61.0          GenomicRanges_1.46.1        GenomeInfoDb_1.30.0         IRanges_2.28.0             
[11] S4Vectors_0.32.3            BiocGenerics_0.40.0         forcats_0.5.1               stringr_1.4.0               dplyr_1.0.7                
[16] purrr_0.3.4                 readr_2.1.1                 tidyr_1.1.4                 tibble_3.1.6                ggplot2_3.3.5              
[21] tidyverse_1.3.1            

loaded via a namespace (and not attached):
  [1] colorspace_2.0-2         rjson_0.2.21             ellipsis_0.3.2           rprojroot_2.0.2          XVector_0.34.0           fs_1.5.2                
  [7] rstudioapi_0.13          DT_0.20                  bit64_4.0.5              AnnotationDbi_1.56.2     fansi_1.0.2              lubridate_1.8.0         
 [13] xml2_1.3.3               codetools_0.2-18         splines_4.1.2            doParallel_1.0.16        cachem_1.0.6             geneplotter_1.72.0      
 [19] knitr_1.37               jsonlite_1.7.3           apcluster_1.4.9          Rsamtools_2.10.0         broom_0.7.11             annotate_1.72.0         
 [25] dbplyr_2.1.1             png_0.1-7                compiler_4.1.2           httr_1.4.2               backports_1.4.1          assertthat_0.2.1        
 [31] Matrix_1.4-0             fastmap_1.1.0            cli_3.1.1                htmltools_0.5.2          tools_4.1.2              igraph_1.2.11           
 [37] gtable_0.3.0             glue_1.6.1               GenomeInfoDbData_1.2.7   doRNG_1.8.2              Rcpp_1.0.8               cellranger_1.1.0        
 [43] vctrs_0.3.8              Biostrings_2.62.0        svglite_2.0.0            rtracklayer_1.54.0       iterators_1.0.13         xfun_0.29               
 [49] rvest_1.0.2              lifecycle_1.0.1          restfulr_0.0.13          rngtools_1.5.2           XML_3.99-0.8             zlibbioc_1.40.0         
 [55] scales_1.1.1             vroom_1.5.7              hms_1.1.1                parallel_4.1.2           RColorBrewer_1.1-2       curl_4.3.2              
 [61] yaml_2.2.1               memoise_2.0.1            stringi_1.7.6            RSQLite_2.2.9            genefilter_1.76.0        BiocIO_1.4.0            
 [67] foreach_1.5.1            BiocParallel_1.28.3      systemfonts_1.0.3        rlang_0.4.12             pkgconfig_2.0.3          bitops_1.0-7            
 [73] evaluate_0.14            lattice_0.20-45          GenomicAlignments_1.30.0 htmlwidgets_1.5.4        bit_4.0.4                tidyselect_1.1.1        
 [79] plyr_1.8.6               magrittr_2.0.1           R6_2.5.1                 generics_0.1.1           DelayedArray_0.20.0      DBI_1.1.2               
 [85] whisker_0.4              pillar_1.6.4             haven_2.4.3              withr_2.4.3              survival_3.2-13          KEGGREST_1.34.0         
 [91] RCurl_1.98-1.5           modelr_0.1.8             crayon_1.4.2             utf8_1.2.2               tzdb_0.2.0               rmarkdown_2.11          
 [97] locfit_1.5-9.4           grid_4.1.2               readxl_1.3.1             blob_1.2.2               reprex_2.0.1             digest_0.6.29           
[103] xtable_1.8-4             munsell_0.5.0