egeulgen / pathfindR

pathfindR: Enrichment Analysis Utilizing Active Subnetworks
https://egeulgen.github.io/pathfindR/
Other
178 stars 25 forks source link

Failed in ##Annotating involved genes and visualizing enriched terms #31

Closed qingzhou1 closed 4 years ago

qingzhou1 commented 4 years ago

Describe the bug Failed during ##Annotating involved genes and visualizing enriched terms Error: Error in (function (cl, name, valueClass) : assignment of an object of class “matrix” is not valid for @‘Rkeys’ in an object of class “AnnDbBimap”; is(value, "character") is not TRUE

To Reproduce Steps to reproduce the behavior:

  1. Prepare input as '...' Gene_symbol logFC FDR_p CLPS CLPS -2.748643 0 CPA1 CPA1 -2.416595 0 PRSS1 PRSS1 -2.262417 0 PLA2G1B PLA2G1B -2.568676 0 SYCN SYCN -2.823795 0
  2. Run the following function: '....' output <- run_pathfindR(pathfind.input, p_val_threshold = 0.01, gene_sets = "KEGG")
  3. See error Annotating involved genes and visualizing enriched terms Error in (function (cl, name, valueClass) : assignment of an object of class “matrix” is not valid for @‘Rkeys’ in an object of class “AnnDbBimap”; is(value, "character") is not TRUE

Desktop (please complete the following information):

R Session Information: R version 3.6.1 (2019-07-05) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.5 LTS

Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.6.0 LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

Random number generation: RNG: Mersenne-Twister Normal: Inversion Sample: Rounding

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=de_AT.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=de_AT.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=de_AT.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=de_AT.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] pathfindR_1.4.2

loaded via a namespace (and not attached): [1] ggrepel_0.8.1 Rcpp_1.0.3 lattice_0.20-38 tidyr_1.0.2
[5] assertthat_0.2.1 digest_0.6.23 foreach_1.4.8 ggforce_0.3.1
[9] R6_2.4.1 GenomeInfoDb_1.20.0 stats4_3.6.1 RSQLite_2.2.0
[13] evaluate_0.14 ggplot2_3.2.1 pillar_1.4.3 zlibbioc_1.30.0
[17] rlang_0.4.4 lazyeval_0.2.2 rstudioapi_0.11 blob_1.2.1
[21] S4Vectors_0.22.1 Matrix_1.2-18 rmarkdown_2.1 BiocParallel_1.18.1
[25] bit_1.1-15.2 igraph_1.2.4.2 RCurl_1.98-1.1 polyclip_1.10-0
[29] munsell_0.5.0 DelayedArray_0.10.0 compiler_3.6.1 xfun_0.12
[33] pkgconfig_2.0.3 BiocGenerics_0.30.0 htmltools_0.4.0 tidyselect_1.0.0
[37] SummarizedExperiment_1.14.1 tibble_2.1.3 gridExtra_2.3 GenomeInfoDbData_1.2.1
[41] IRanges_2.18.3 codetools_0.2-16 matrixStats_0.55.0 graphlayouts_0.5.0
[45] fansi_0.4.1 viridisLite_0.3.0 crayon_1.3.4 dplyr_0.8.4
[49] MASS_7.3-51.5 bitops_1.0-6 grid_3.6.1 DBI_1.1.0
[53] gtable_0.3.0 lifecycle_0.1.0 magrittr_1.5 scales_1.1.0
[57] cli_2.0.1 farver_2.0.3 XVector_0.24.0 viridis_0.5.1
[61] doParallel_1.0.15 vctrs_0.2.2 org.Hs.eg.db_3.8.2 iterators_1.0.12
[65] tools_3.6.1 bit64_0.9-7 Biobase_2.44.0 glue_1.3.1
[69] tweenr_1.0.1 purrr_0.3.3 ggraph_2.0.1 parallel_3.6.1
[73] yaml_2.2.1 AnnotationDbi_1.46.1 colorspace_1.4-1 BiocManager_1.30.10
[77] GenomicRanges_1.36.1 tidygraph_1.1.2 memoise_1.1.0 knitr_1.28

Additional context Add any other context about the problem here. While pathfindR is an R package, the active subnetwork search functionality is written in Java. If you suspect any issue regarding java please provide your Java version (by running java --version)

egeulgen commented 4 years ago

hello @qingzhou1

Do you mind sharing the input data (pathfind.input) as an RDS file?

qingzhou1 commented 4 years ago

pathfind.rds.zip sure, here is the RDS file, thanks a lot for your quick reply.

egeulgen commented 4 years ago

hello again, The problem seems to be in the function visualize_hsa_KEGG() where the below command is used for EG ids of the input genes:

tmp <- AnnotationDbi::mget(input_processed$GENE,
                             AnnotationDbi::revmap(org.Hs.eg.db::org.Hs.egSYMBOL),
                             ifnotfound = NA)

On my end, I cannot reproduce your issue: run_pathfindR() works without errors. then. again the first 6 lines are different than the ones you shared but you may just reordered it:

>> head(input_df)
  Gene_symbol  logFC     FDR_p
1      MARCH3 1.2822 4.17e-103
2      MARCH4 2.0000 2.52e-134
3       SEPT1 3.0694 5.24e-286
4       SEPT3 1.8696 1.29e-152
5       SEPT6 1.0471  3.53e-89
6       A2ML1 2.6705 1.77e-150

Can you try and see if the code below works OK for your input data frame (If so, the problem may lie somewhere else)? I gave an example of what the output should look like.

library(pathfindR)
input_df <- readRDS("pathfind.RDS")
input_processed <- input_processing(input_df)
eg_ids_list <- AnnotationDbi::mget(input_processed$GENE,
                                   AnnotationDbi::revmap(org.Hs.eg.db::org.Hs.egSYMBOL),
                                   ifnotfound = NA)
head(eg_ids_list,2)
$MARCH3
[1] "115123"

$MARCH4
[1] "57574"

Hope this helps, -E

qingzhou1 commented 4 years ago

input_processed <- input_processing(input_df, pin_name_path = "KEGG", p_val_threshold = 0.01) Number of genes provided in input: 3024 Number of genes in input after p-value filtering: 3024 pathfindR cannot handle p values < 1e-13. These were changed to 1e-13 Could not find any interactions for 1977 (65.38%) genes in the PIN Final number of genes in input: 1033

This work with on problem. But the next code did not work. See below:

eg_ids_list <- AnnotationDbi::mget(input_processed$GENE, AnnotationDbi::revmap(org.Hs.eg.db::org.Hs.egSYMBOL), ifnotfound = NA) Error in (function (cl, name, valueClass) : assignment of an object of class “matrix” is not valid for @‘Rkeys’ in an object of class “AnnDbBimap”; is(value, "character") is not TRUE

Seems like the AnnotationDbi have problem, I am not sure dose the version of this package play a role, as my pathfindR also have no problem until yesterday I updated this package.

Thanks a lot for you quick reply. Best

qingzhou1 commented 4 years ago

Interesting, I just find out the data type of input_df$Gene is a matrix:

class(input_processed$GENE) [1] "matrix"

Is this how it should be?

qingzhou1 commented 4 years ago

Anyway, I think I am going to re-install everything in case there are some miss communication between the packages in my linux. However, I tried with my mac and got a new error:

PE <- run_pathfindR(input_df, p_val_threshold = 0.01, gene_sets = "KEGG") n_processes is set to iterations because iterations < n_processes There is already a directory named "pathfindR_Results". Writing the result to "pathfindR_Results(4)" not to overwrite any previous results.

Testing input

The input looks OK

Processing input. Converting gene symbols, if necessary (and if human gene symbols provided)

Number of genes provided in input: 3024 Number of genes in input after p-value filtering: 3024 pathfindR cannot handle p values < 1e-13. These were changed to 1e-13 Could not find any interactions for 340 (11.24%) genes in the PIN Final number of genes in input: 2673

Performing Active Subnetwork Search and Enrichment

Processing the enrichment results over all iterations

Annotating involved genes and visualizing enriched terms

Error in $<-.data.frame(*tmp*, "EG_ID", value = c(CLPS = "1208", CPA1 = "1357", : replacement has 2675 rows, data has 2673 In addition: Warning message: In pathfindR::input_processing(input, p_val_threshold, pin_path, : The gene column was turned into character from factor.

Sorry for so many problems, and I really appreciate you helpful reply. Best

egeulgen commented 4 years ago

I think it's related to an issue with AnnotationDbi. For the problem on your mac (and linux too), can you try with the latest development version of pathfindR? You can install it via:

install.packages("pak") # if you have not installed "pak"
pak::pkg_install("egeulgen/pathfindR")
qingzhou1 commented 4 years ago

Thanks a lot. I re-installed all the packages and, now it works. Best