egeulgen / pathfindR

pathfindR: Enrichment Analysis Utilizing Active Subnetworks
https://egeulgen.github.io/pathfindR/
Other
177 stars 25 forks source link

KEGG pathway diagram 1. UnableToReadFont `helvetica' 2. proteins not showing the proper color #169

Closed Huan-Jui-Chang closed 11 months ago

Huan-Jui-Chang commented 11 months ago

Describe the bug Hi! I'm using the visualize_terms() function to graph the enriched proteins onto KEGG pathway diagram. However I've encountered the following two issues:

  1. UnableToReadFont `helvetica'

    • Here are the codes that produces the error messages:
      output_df <- run_pathfindR(geneList, pin_name_path = "IntAct") 
      input_processed <- input_processing(geneList)
      visualize_terms(
      result_df = output_df,
      input_processed = input_processed,
      hsa_KEGG = TRUE
      )
    • Here are the the error messages:
      
      Downloading pathway diagrams of 10 KEGG pathways

    |====================================================================================| 100% Saving colored pathway diagrams of 10 KEGG pathways

    | | 0%Error: rsession-arm64: UnableToReadFont `helvetica' @ error/annotate.c/RenderFreetype/1396

    - From the error code, it seems like an issue of not capitalizing `h` in the font. From the solved discussion in #117 , it has been suggested to install the font from web. However, I've working on MacOS, and Helvetica is part of the system fonts. As a result, I didn't try to remove it and re-installed Helvetica in case the whole system broke down. 
    - Instead, I've look up ChatGPT and it suggested me to install the showtext package so to manually override the font selection from magick. This did overcome the Helvetica font issues! Here are the codes.

    library(showtext) Loading required package: sysfonts Loading required package: showtextdb showtext_auto() visualize_terms(

    • result_df = output_df,
    • input_processed = input_processed,
    • hsa_KEGG = TRUE
    • ) Downloading pathway diagrams of 10 KEGG pathways

    |====================================================================================| 100% Saving colored pathway diagrams of 10 KEGG pathways

    |====================================================================================| 100%

    
    - But after this modification, the second issue emerged.
  2. KEGG proteins not showing the proper color

Desktop (please complete the following information):

R Session Information:

R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.4.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Chicago
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] showtext_0.9-6       showtextdb_3.0       sysfonts_0.8.8       KEGGREST_1.41.0     
[5] pathfindR_2.1.0      pathfindR.data_2.0.0

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0        viridisLite_0.4.2       dplyr_1.1.2            
 [4] farver_2.1.1            blob_1.2.4              viridis_0.6.4          
 [7] Biostrings_2.68.1       bitops_1.0-7            ggraph_2.1.0           
[10] fastmap_1.1.1           RCurl_1.98-1.12         tweenr_2.0.2           
[13] XML_3.99-0.14           digest_0.6.33           lifecycle_1.0.3        
[16] KEGGgraph_1.60.0        RSQLite_2.3.1           magrittr_2.0.3         
[19] compiler_4.3.1          rlang_1.1.1             tools_4.3.1            
[22] igraph_1.5.0.1          utf8_1.2.3              knitr_1.43             
[25] labeling_0.4.2          graphlayouts_1.0.0      bit_4.0.5              
[28] curl_5.0.1              withr_2.5.0             purrr_1.0.1            
[31] BiocGenerics_0.46.0     grid_4.3.1              polyclip_1.10-4        
[34] stats4_4.3.1            fansi_1.0.4             colorspace_2.1-0       
[37] ggplot2_3.4.2           scales_1.2.1            iterators_1.0.14       
[40] MASS_7.3-60             cli_3.6.1               rmarkdown_2.23         
[43] crayon_1.5.2            generics_0.1.3          rstudioapi_0.15.0      
[46] httr_1.4.6              DBI_1.1.3               cachem_1.0.8           
[49] ggforce_0.4.1           zlibbioc_1.46.0         parallel_4.3.1         
[52] AnnotationDbi_1.62.2    XVector_0.40.0          vctrs_0.6.3            
[55] IRanges_2.34.1          S4Vectors_0.38.1        bit64_4.0.5            
[58] ggrepel_0.9.3           Rgraphviz_2.44.0        magick_2.7.4           
[61] foreach_1.5.2           tidyr_1.3.0             glue_1.6.2             
[64] codetools_0.2-19        gtable_0.3.3            GenomeInfoDb_1.36.1    
[67] munsell_0.5.0           tibble_3.2.1            pillar_1.9.0           
[70] htmltools_0.5.5         graph_1.78.0            GenomeInfoDbData_1.2.10
[73] R6_2.5.1                doParallel_1.0.17       tidygraph_1.2.3        
[76] evaluate_0.21           Biobase_2.60.0          png_0.1-8              
[79] memoise_2.0.1           Rcpp_1.0.11             gridExtra_2.3          
[82] org.Hs.eg.db_3.17.0     xfun_0.39               pkgconfig_2.0.3

Additional context Thanks in advanced for your help!

egeulgen commented 11 months ago

I think the work-around implemented to resolve this second issue (which simply colors all unaffected nodes white) cannot work in the case where the number of nodes in the pathway is larger. than a certain number. I'll try to check and confirm thiis over the weekend. see https://github.com/egeulgen/pathfindR/issues/125#issuecomment-1182004840

egeulgen commented 11 months ago

hello again. As I commented above, KEGG unfortunately restricts the number of nodes to color. In the function here: https://github.com/egeulgen/pathfindR/blob/master/R/visualization.R#L658 I had to discard the colors of background (unaffected, white) genes if the number is larger than 60 as KEGG refuses to handle this. Sorry for the inconvenience, I'll keep this in mind still and try to figure out a solution but it seems unlikely.

Meanwhile, my suggestion is to change the following argument when calling visualize_terms():

node_cols low, middle and high color values for coloring the pathway nodes
#' (default = \code{NULL}). If \code{node_cols=NULL}, the low, middle and high color
#' are set as "green", "gray" and "red"...

i.e. something like:

visualize_terms(
 result_df = output_df,
 input_processed = input_processed,
 hsa_KEGG = TRUE
 node_cols = c("purple", "gray", "red')
)
lucasrocmoreira commented 2 months ago

Hi,

I'm reopening this issue because I'm still having this problem. Even after changing the color gradient, all plots are still only colored as a light green. See below.

hsa00010_pathfindR

egeulgen commented 2 months ago

Hi @lucasrocmoreira

I suspected my above comment is (part of) the issue here:

KEGG unfortunately restricts the number of nodes to color.

Main issue:

In pathfindR, we use the KEGG REST API to color each input node(gene). We try to color the backgrounds of all non-input colours as white (some of these are light green in KEGG already, for your example, see here: https://www.kegg.jp/pathway/hsa00010). However, KEGG restricts the number of nodes we can colour through their API. As such, if the number of nodes to color (the input genes + background genes that may already be green), we discard any colour that is white (here in the source code). Currently, we don't have a good workaround for this as we have to rely on KEGG for this functionality. I will keep this issue updated if there is any possible resolution for this issue.

Second issue:

The input genes that you provide should still come up as coloured in the final image. Can't be sure why that is happening so it would good to debug further. Can you provide the input data/code so I can try to reproduce the issue and hopefully resolve it?

lucasrocmoreira commented 2 months ago

Thank you for the quick response, @egeulgen

This is my input file: input_df.csv

I ran the following commands:

output_df.KEGG.NvsH <- run_pathfindR(input_df.NvsH,
                                     gene_sets = "KEGG",
                                     p_val_threshold = 0.05,
                                     output_dir = "pathfindR_KEGG.37vs41")

input_processed.NvsH <- input_processing(input_df.NvsH)
visualize_terms(
  result_df = output_df.KEGG.NvsH,
  input_processed = input_processed.NvsH,
  hsa_KEGG = TRUE
)
egeulgen commented 2 months ago

It seems that this a more general problem.

It's how KEGGREST (the R package that pathfindR depends on for this functionality) interacts with KEGG, their KEGGREST::color.pathway.by.objects function is not functioning properly any more. I'm not sure if it's resolvable but I will reach out to the maintainers with the bug.

egeulgen commented 2 months ago

it looks like KEGG does not allow fetching the coloured pathway diagrams directly through the POST requests any more, I don't think it's fixable. I'll try and see alternative solutions and keep you updated

egeulgen commented 2 months ago

@lucasrocmoreira thanks again for reporting this. The latest dev version of pathfindR now uses ggkegg instead of KEGGREST for getting coloured KEGG pathway diagrams. You can download the latest dev version via:

install.packages("devtools") # if you have not installed "devtools"
devtools::install_github("egeulgen/pathfindR")

Note that visualize_terms() will now return a list of ggraph objects (essentially ggplot objects) that contain the visualisations. You can save each visualisation in your desired format (example provided for pdf format):

gg_list <- visualize_terms(
  result_df = output_df.KEGG.NvsH,
  input_processed = input_processed.NvsH,
  hsa_KEGG = TRUE
)

ggplot2::ggsave(
  "pathway_vis.pdf",   # path to output, format is determined by extension
  gg_list$hsa00010,    # what to plot
  width = 5            # adjust width
  height = 5           # adjust height
) 

This will be part of the next release but until then is available in the dev version.