egeulgen / pathfindR

pathfindR: Enrichment Analysis Utilizing Active Subnetworks
https://egeulgen.github.io/pathfindR/
Other
177 stars 25 forks source link

I am having an issue with pin_name_path #157

Closed DrJoshVandenbrink closed 1 year ago

DrJoshVandenbrink commented 1 year ago

Why trying to load a custom pin file, I am receiving a the error:

"The second column of the PIN file must all be "pp"

However

Steps to reproduce the behavior:

Download the PIN String file

URL <- "https://stringdb-static.org/download/protein.links.v11.5/3702.protein.links.v11.5.txt.gz" path2file <- file.path(tempdir(check = TRUE), "STRING.txt.gz") download.file(URL, path2file)

Loading the string file

ath_string_df <- read.table(path2file, header = TRUE)

ath_string_df <- ath_string_df[ath_string_df$combined_score >= 400, ]

Removing the excess around the gene names

ath_string_pin <- data.frame(Interactor_A = sub("^3702\.", "", ath_string_df$protein1), Interactor_B = sub("^3702\.", "", ath_string_df$protein2))

ath_string_pin <- data.frame(Interactor_A = sub("\..$", "", ath_string_pin$Interactor_A), Interactor_B = sub("\..$", "", ath_string_pin$Interactor_B))

Getting the Gene Symbols

Kegg <- org.At.tairSYMBOL mapped_genes <- mappedkeys(Kegg)

symbols <- as.data.frame(Kegg[mapped_genes])

Replacing TAIR IDs with Symbols

ath_string_pin$Interactor_A <- symbols$symbol[match(ath_string_pin$Interactor_A, symbols$gene_id)] ath_string_pin$Interactor_B <- symbols$symbol[match(ath_string_pin$Interactor_B, symbols$gene_id)] ath_string_pin <- ath_string_pin[!is.na(ath_string_pin$Interactor_A) & !is.na(ath_string_pin$Interactor_B), ] ath_string_pin <- ath_string_pin[ath_string_pin$Interactor_A != "" & ath_string_pin$Interactor_B != "", ]

self_intr_cond <- ath_string_pin$Interactor_A == ath_string_pin$Interactor_B ath_string_pin <- ath_string_pin[!self_intr_cond, ]

ath_string_pin <- unique(t(apply(ath_string_pin, 1, sort))) # this will return a matrix object

Adding the "pp" in the center column

data.frame(A = ath_string_pin[, 1], pp = "pp", B = ath_string_pin[, 2])

Saving the SIF file

path2SIF <- file.path(tempdir(), "PIN.sif") write.table(ath_string_pin, file = path2SIF, col.names = FALSE, row.names = FALSE, Error_Screenshot.pdf

        sep = "\t",
        quote = FALSE)

path2SIF <- normalizePath(path2SIF)

Running PathfindR

output_df <- run_pathfindR(input = Ler, convert2alias = FALSE, gene_sets = "Custom", custom_genes = ath_kegg_genes, custom_descriptions = ath_kegg_descriptions, pin_name_path = "/tmp/RtmpqdjAO7/PIN.sif")

Expected behavior I am expecting pathviewR to run, however I get the "pp" error, but when looking at my sif file and data.frame, all values of the center column ARE "pp"

Screenshots image image image

Desktop (please complete the following information):

R Session Information: R version 4.2.2 Patched (2022-11-10 r83330) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 22.04.1 LTS

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] rTRM_1.36.0 igraph_1.4.1 KEGGREST_1.38.0 org.At.tair.db_3.16.0 AnnotationDbi_1.60.0 IRanges_2.32.0
[7] S4Vectors_0.36.1 Biobase_2.58.0 BiocGenerics_0.44.0 readxl_1.4.2 lubridate_1.9.2 forcats_1.0.0
[13] stringr_1.5.0 dplyr_1.1.0 purrr_1.0.1 readr_2.1.4 tidyr_1.3.0 tibble_3.1.8
[19] ggplot2_3.4.1 tidyverse_2.0.0 pathfindR_1.6.4.9000 pathfindR.data_1.1.3

loaded via a namespace (and not attached): [1] bitops_1.0-7 bit64_4.0.5 doParallel_1.0.17 httr_1.4.4 GenomeInfoDb_1.34.9 tools_4.2.2
[7] utf8_1.2.3 R6_2.5.1 DBI_1.1.3 colorspace_2.1-0 withr_2.5.0 tidyselect_1.2.0
[13] gridExtra_2.3 curl_5.0.0 bit_4.0.5 compiler_4.2.2 cli_3.6.0 scales_1.2.1
[19] digest_0.6.31 rmarkdown_2.20 XVector_0.38.0 pkgconfig_2.0.3 htmltools_0.5.4 fastmap_1.1.0
[25] rlang_1.0.6 rstudioapi_0.14 RSQLite_2.3.0 farver_2.1.1 generics_0.1.3 RCurl_1.98-1.10
[31] magrittr_2.0.3 GenomeInfoDbData_1.2.9 Rcpp_1.0.10 munsell_0.5.0 fansi_1.0.4 viridis_0.6.2
[37] lifecycle_1.0.3 stringi_1.7.12 ggraph_2.1.0 MASS_7.3-58.2 zlibbioc_1.44.0 grid_4.2.2
[43] blob_1.2.3 parallel_4.2.2 ggrepel_0.9.3 crayon_1.5.2 graphlayouts_0.8.4 Biostrings_2.66.0
[49] hms_1.1.2 knitr_1.42 pillar_1.8.1 codetools_0.2-19 glue_1.6.2 evaluate_0.20
[55] png_0.1-8 vctrs_0.5.2 tzdb_0.3.0 tweenr_2.0.2 foreach_1.5.2 cellranger_1.1.0
[61] gtable_0.3.1 polyclip_1.10-4 cachem_1.0.6 xfun_0.37 ggforce_0.4.1 tidygraph_1.2.3
[67] viridisLite_0.4.1 iterators_1.0.14 memoise_2.0.1 timechange_0.2.0 ellipsis_0.3.2

Additional context openjdk 11.0.17 2022-10-18 OpenJDK Runtime Environment (build 11.0.17+8-post-Ubuntu-1ubuntu222.04) OpenJDK 64-Bit Server VM (build 11.0.17+8-post-Ubuntu-1ubuntu222.04, mixed mode, sharing)

Thanks in advance!

DrJoshVandenbrink commented 1 year ago

image image

Attached the same picture 3 times above!

egeulgen commented 1 year ago

can you kindly share the SIF file?

DrJoshVandenbrink commented 1 year ago

Thanks for getting back to me so fast!

Here is the sif file, converted to txt so it would upload. athPIN.sif.txt

egeulgen commented 1 year ago

hello again, I just pushed a fix addressing this issue. You may install the latest dev version via:

install.packages("devtools") # if you have not installed "devtools" 
devtools::install_github("egeulgen/pathfindR")
DrJoshVandenbrink commented 1 year ago

Works great now! Thanks for your help!