bzhanglab / WebGestaltR

R package for WebGestalt
https://bzhanglab.github.io/WebGestaltR/
34 stars 14 forks source link

latest cran version is failing at devtools::build() step #22

Closed nfancy closed 1 year ago

nfancy commented 1 year ago

Hi, Thanks for the package. I am using your package for functional a analysis for our package. However, the latest version is failing at while loading the genesets. This is the error: I pasted the start and the end of the trace. I got this error while building the docker image. Locally, the previous version (0.4.4) works fine but fails at 0.4.5.

It will be great if you can look at the issue.

#14 2359.9  ----------- FAILURE REPORT -------------- 
#14 2359.9  --- failure: length > 1 in coercion to logical ---
#14 2359.9  --- srcref --- 
#14 2359.9 : 
#14 2359.9  --- package (from environment) --- 
#14 2359.9 WebGestaltR
#14 2359.9  --- call from context --- 
#14 2359.9 FUN(X[[i]], ...)
#14 2359.9  --- call from argument --- 
#14 2359.9 !is.na(data) && !is.null(data)
#14 2359.9  --- R stacktrace ---
#14 2359.9 where 1: FUN(X[[i]], ...)
#14 2359.9 where 2: lapply(data, .toList)
#14 2359.9 where 3: readGmt(gmtUrl, cache = cache)
#14 2359.9 where 4: loadGeneSet(organism = organism, enrichDatabase = enrichDatabase, 
#14 2359.9     enrichDatabaseFile = enrichDatabaseFile, enrichDatabaseType = enrichDatabaseType, 
#14 2359.9     enrichDatabaseDescriptionFile = enrichDatabaseDescriptionFile, 
#14 2359.9     cache = cache, hostName = hostName)
...
...
#14 2360.0  --- value of length: 129 type: logical ---
#14 2360.0   [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#14 2360.0  [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#14 2360.0  [31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#14 2360.0  [46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#14 2360.0  [61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#14 2360.0  [76] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#14 2360.0  [91] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#14 2360.0 [106] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#14 2360.0 [121] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#14 2360.0  --- function from context --- 
#14 2360.0 function (data) 
#14 2360.0 {
#14 2360.0     if (length(data) > 2) {
#14 2360.0         data <- data[!is.na(data) && !is.null(data)]
#14 2360.0         data1 <- cbind(rep(gsub("%", "_", data[1], fixed = TRUE), 
#14 2360.0             length(data) - 2), rep(data[2], length(data) - 2), 
#14 2360.0             data[c(-1, -2)])
#14 2360.0         return(data1)
#14 2360.0     }
#14 2360.0     else {
#14 2360.0         return(NULL)
#14 2360.0     }
#14 2360.0 }

Thanks in advance.

yxngl commented 1 year ago

The error happens with a function call, not in building the R package, right? Do you have the error when used outside of the Docker? Could you share the enrichment database you are using?

nfancy commented 1 year ago

Hi, thank you very much for your quick reply. So, I found the reason of the error. The latest version generates an error after running the function. length > 1 in coercion to logical, the reason it does not fail when I just the function is because it's an warning. However, devtools::check() fails because it is supposed to fail if there is any warning. Do you have any idea how to suppress this warning msg? I was just using the default databases available for WebGestaltR

"geneontology_Biological_Process_noRedundant",
"geneontology_Cellular_Component_noRedundant",
 "geneontology_Molecular_Function_noRedundant",
 "pathway_KEGG",
 "pathway_Reactome",
 "pathway_Wikipathway"
nfancy commented 1 year ago

This is the warning which is generated in Rstudio console.

Warning messages:
1: In !is.na(data) && !is.null(data) :
  'length(x) = 31 > 1' in coercion to 'logical(1)'
2: In !is.na(data) && !is.null(data) :
  'length(x) = 111 > 1' in coercion to 'logical(1)'
3: In !is.na(data) && !is.null(data) :
  'length(x) = 218 > 1' in coercion to 'logical(1)'
4: In !is.na(data) && !is.null(data) :
  'length(x) = 257 > 1' in coercion to 'logical(1)'
5: In !is.na(data) && !is.null(data) :
  'length(x) = 83 > 1' in coercion to 'logical(1)'
6: In !is.na(data) && !is.null(data) :
  'length(x) = 161 > 1' in coercion to 'logical(1)'
7: In !is.na(data) && !is.null(data) :
  'length(x) = 43 > 1' in coercion to 'logical(1)'
8: In !is.na(data) && !is.null(data) :
  'length(x) = 334 > 1' in coercion to 'logical(1)'
9: In !is.na(data) && !is.null(data) :
  'length(x) = 373 > 1' in coercion to 'logical(1)'
10: In !is.na(data) && !is.null(data) :
  'length(x) = 337 > 1' in coercion to 'logical(1)'
11: In !is.na(data) && !is.null(data) :
  'length(x) = 489 > 1' in coercion to 'logical(1)'
12: In !is.na(data) && !is.null(data) :
  'length(x) = 35 > 1' in coercion to 'logical(1)'
13: In !is.na(data) && !is.null(data) :
  'length(x) = 22 > 1' in coercion to 'logical(1)'
14: In !is.na(data) && !is.null(data) :
  'length(x) = 77 > 1' in coercion to 'logical(1)'
15: In !is.na(data) && !is.null(data) :
  'length(x) = 91 > 1' in coercion to 'logical(1)'
16: In !is.na(data) && !is.null(data) :
  'length(x) = 328 > 1' in coercion to 'logical(1)'
17: In !is.na(data) && !is.null(data) :
  'length(x) = 383 > 1' in coercion to 'logical(1)'
18: In !is.na(data) && !is.null(data) :
  'length(x) = 347 > 1' in coercion to 'logical(1)'
19: In !is.na(data) && !is.null(data) :
  'length(x) = 198 > 1' in coercion to 'logical(1)'
20: In !is.na(data) && !is.null(data) :
  'length(x) = 153 > 1' in coercion to 'logical(1)'
21: In !is.na(data) && !is.null(data) :
  'length(x) = 31 > 1' in coercion to 'logical(1)'
22: In !is.na(data) && !is.null(data) :
  'length(x) = 258 > 1' in coercion to 'logical(1)'
23: In !is.na(data) && !is.null(data) :
  'length(x) = 420 > 1' in coercion to 'logical(1)'
24: In !is.na(data) && !is.null(data) :
  'length(x) = 159 > 1' in coercion to 'logical(1)'
25: In !is.na(data) && !is.null(data) :
  'length(x) = 27 > 1' in coercion to 'logical(1)'
26: In !is.na(data) && !is.null(data) :
  'length(x) = 33 > 1' in coercion to 'logical(1)'
27: In !is.na(data) && !is.null(data) :
  'length(x) = 482 > 1' in coercion to 'logical(1)'
28: In !is.na(data) && !is.null(data) :
  'length(x) = 208 > 1' in coercion to 'logical(1)'
29: In !is.na(data) && !is.null(data) :
  'length(x) = 111 > 1' in coercion to 'logical(1)'
30: In !is.na(data) && !is.null(data) :
  'length(x) = 94 > 1' in coercion to 'logical(1)'
31: In !is.na(data) && !is.null(data) :
  'length(x) = 68 > 1' in coercion to 'logical(1)'
32: In !is.na(data) && !is.null(data) :
  'length(x) = 39 > 1' in coercion to 'logical(1)'
33: In !is.na(data) && !is.null(data) :
  'length(x) = 332 > 1' in coercion to 'logical(1)'
34: In !is.na(data) && !is.null(data) :
  'length(x) = 115 > 1' in coercion to 'logical(1)'
35: In !is.na(data) && !is.null(data) :
  'length(x) = 384 > 1' in coercion to 'logical(1)'
36: In !is.na(data) && !is.null(data) :
  'length(x) = 34 > 1' in coercion to 'logical(1)'
37: In !is.na(data) && !is.null(data) :
  'length(x) = 33 > 1' in coercion to 'logical(1)'
38: In !is.na(data) && !is.null(data) :
  'length(x) = 174 > 1' in coercion to 'logical(1)'
39: In !is.na(data) && !is.null(data) :
  'length(x) = 22 > 1' in coercion to 'logical(1)'
40: In !is.na(data) && !is.null(data) :
  'length(x) = 46 > 1' in coercion to 'logical(1)'
41: In !is.na(data) && !is.null(data) :
  'length(x) = 192 > 1' in coercion to 'logical(1)'
42: In !is.na(data) && !is.null(data) :
  'length(x) = 498 > 1' in coercion to 'logical(1)'
43: In !is.na(data) && !is.null(data) :
  'length(x) = 51 > 1' in coercion to 'logical(1)'
44: In !is.na(data) && !is.null(data) :
  'length(x) = 240 > 1' in coercion to 'logical(1)'
45: In !is.na(data) && !is.null(data) :
  'length(x) = 27 > 1' in coercion to 'logical(1)'
46: In !is.na(data) && !is.null(data) :
  'length(x) = 498 > 1' in coercion to 'logical(1)'
47: In !is.na(data) && !is.null(data) :
  'length(x) = 156 > 1' in coercion to 'logical(1)'
48: In !is.na(data) && !is.null(data) :
  'length(x) = 64 > 1' in coercion to 'logical(1)'
49: In !is.na(data) && !is.null(data) :
  'length(x) = 129 > 1' in coercion to 'logical(1)'
50: In !is.na(data) && !is.null(data) :
  'length(x) = 418 > 1' in coercion to 'logical(1)'

Any pointers will be much appreciated. Thanks.

yxngl commented 1 year ago

Hi, this does not look right and should not happen. Could you send an example that I can reproduce? I tried

enrichDatabase=c("geneontology_Biological_Process_noRedundant",
                                             "geneontology_Cellular_Component_noRedundant",
                                             "geneontology_Molecular_Function_noRedundant",
                                             "pathway_KEGG",
                                             "pathway_Reactome",
                                             "pathway_Wikipathway")

and it is fine. !is.na(data) && !is.null(data) seems only appear to be in readGMT, but if you are using standard gene sets from our server, I don't see how the last version affects it besides updating the URL to HTTPS.

nfancy commented 1 year ago

bug_fix.zip

My function and the gene list. My function call is

enrichment_result <- pathway_analysis_webgestaltr(sig_de,
                                                  enrichment_method = "ORA")

the session info is as follows:

> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8    
 [5] LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8    LC_PAPER=C.UTF-8       LC_NAME=C             
 [9] LC_ADDRESS=C           LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

What's the structure of the data that is used in the readGmt function?

yxngl commented 1 year ago

I tested with your code and input. No warnings and got several nice plots.

> enrichment_result <- pathway_analysis_webgestaltr(sig_de, enrichment_method = "ORA")
Using genome_protein-coding as background gene list
Loading the functional categories...
Loading the ID list...
Loading the reference list...
Performing the enrichment analysis...
i Output is returned as a list!

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    
system code page: 936

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_1.0.5   ggplot2_3.3.3

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9        pillar_1.4.6      compiler_4.0.2    rngtools_1.5      iterators_1.0.12  tools_4.0.2       digest_0.6.25    
 [8] jsonlite_1.7.0    lifecycle_1.0.0   tibble_3.0.3      gtable_0.3.0      lattice_0.20-41   pkgconfig_2.0.3   rlang_0.4.10     
[15] doRNG_1.8.2       igraph_1.2.5      Matrix_1.2-18     foreach_1.5.0     DBI_1.1.1         cli_2.0.2         rstudioapi_0.11  
[22] curl_4.3          parallel_4.0.2    stringr_1.4.0     httr_1.4.2        apcluster_1.4.8   withr_2.2.0       systemfonts_0.2.3
[29] gdtools_0.2.2     hms_0.5.3         generics_0.0.2    vctrs_0.3.6       cowplot_1.1.1     grid_4.0.2        tidyselect_1.1.0 
[36] svglite_1.2.3.2   glue_1.4.1        R6_2.4.1          fansi_0.4.1       farver_2.1.0      whisker_0.4       readr_1.3.1      
[43] purrr_0.3.4       WebGestaltR_0.4.5 magrittr_1.5      scales_1.1.1      codetools_0.2-16  ellipsis_0.3.1    assertthat_0.2.1 
[50] colorspace_2.0-0  labeling_0.4.2    stringi_1.7.8     munsell_0.5.0     doParallel_1.0.15 crayon_1.3.4 
nfancy commented 1 year ago

can it be R version? I've 4.2.1, you've got 4.0.2?

yxngl commented 1 year ago

No. Just tested on a newer version and MacOS. I also removed suppressWarnings and suppressMessages in your code, but still didn't see anything wrong.

> sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.6

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_3.4.1 dplyr_1.0.10 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.10       pillar_1.8.1      compiler_4.2.2    iterators_1.0.14  tools_4.2.2       rngtools_1.5.2    bit_4.0.5        
 [8] digest_0.6.31     jsonlite_1.8.4    lifecycle_1.0.3   tibble_3.1.8      gtable_0.3.1      lattice_0.20-45   pkgconfig_2.0.3  
[15] rlang_1.0.6       doRNG_1.8.6       igraph_1.3.5      Matrix_1.5-3      foreach_1.5.2     DBI_1.1.3         cli_3.6.0        
[22] rstudioapi_0.14   curl_5.0.0        parallel_4.2.2    stringr_1.5.0     httr_1.4.4        apcluster_1.4.10  withr_2.5.0      
[29] systemfonts_1.0.4 hms_1.1.2         generics_0.1.3    vctrs_0.5.2       cowplot_1.1.1     bit64_4.0.5       grid_4.2.2       
[36] tidyselect_1.2.0  svglite_2.1.1     glue_1.6.2        R6_2.5.1          fansi_1.0.3       vroom_1.6.1       farver_2.1.1     
[43] whisker_0.4.1     tzdb_0.3.0        readr_2.1.3       WebGestaltR_0.4.5 magrittr_2.0.3    ellipsis_0.3.2    scales_1.2.1     
[50] codetools_0.2-18  assertthat_0.2.1  colorspace_2.1-0  utf8_1.2.2        stringi_1.7.12    munsell_0.5.0     doParallel_1.0.17
[57] crayon_1.5.2   
nfancy commented 1 year ago

Thanks, what's the structure of the data in the function?

yxngl commented 1 year ago

The messages you saw probably come from https://github.com/bzhanglab/WebGestaltR/blob/master/R/readGmt.R#L45, which is a utility function used by readGMT function. The GMT file is basically the file of gene sets/enrichment databases and each line records a gene set and its genes separated by tabs. (https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29)

But these functions and data did not change in the latest version. I am not sure what is wrong for you.

nfancy commented 1 year ago

hi, I figured it out, it's the & vs &&. The && returns a single logical value as opposed to the & returned logical vector. I get the warning message for the second command but not the first one. Any chance you can update the function? Thanks in advance.

> test <- letters
> test1 <- test[!is.na(test) & !is.null(test)]
> test1 <- test[!is.na(test) && !is.null(test)]
Warning message:
In !is.na(test) && !is.null(test) :
  'length(x) = 26 > 1' in coercion to 'logical(1)'

https://cran.r-project.org/doc/manuals/r-release/NEWS.html

nfancy commented 1 year ago

Hi, I created a PR if that's helpful. I would appreciate if you please accept the PR and update your master branch.

Thanks, Nurun