egeulgen / pathfindR

pathfindR: Enrichment Analysis Utilizing Active Subnetworks
https://egeulgen.github.io/pathfindR/
Other
178 stars 25 forks source link

Deciding the most useful result output #114

Closed YunielFM closed 2 years ago

YunielFM commented 2 years ago

Hi Ege, first of all, CONGRATS on this fantastic package! I have been using it even before you officially placed it on the CRAN! I have gone through the most recent documentation (again, great stuff!) but I am missing the relationship between the default top10 items used in the enrichment_chart and the actual information calculated with the wrapper run_pathfindR. Which are the criteria for those top10 terms? It does not seem to correspond to any ranked values from the columns Fold_Enrichment, occurrence, support, lowest_p, or highest_p. Additionally, I fail to see a description of what the support metric refers to.

Default output when visualizing the enrichment_chart result: image Which matches the default order of the run_pathfindR result: image

As you can see, these terms are not top-ranked by any metric, hence it would be great to correct it for users that do not go under the hood to extract the data.

Desktop (please complete the following information): Windows 10 x64 (build 22000)

R Session Information: R version 4.1.2 (2021-11-01) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale: [1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252 LC_NUMERIC=C
[5] LC_TIME=English_Canada.1252

attached base packages: [1] parallel stats4 grid stats graphics grDevices utils datasets methods base

other attached packages: [1] BiocStyle_2.20.2 org.Hs.eg.db_3.13.0 AnnotationDbi_1.54.1 IRanges_2.26.0 S4Vectors_0.30.0 Biobase_2.52.0
[7] BiocGenerics_0.38.0 ggVennDiagram_1.2.0 pathfindR_1.6.3 pathfindR.data_1.1.2 conflicted_1.1.0 kableExtra_1.3.4
[13] KEGGREST_1.32.0 KEGGgraph_1.52.0 readxl_1.3.1 ggpubr_0.4.0 RColorBrewer_1.1-2 cowplot_1.1.1
[19] wesanderson_0.3.6 ggsci_2.9 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.8 readr_2.1.2
[25] tidyr_1.2.0 tibble_3.1.6 ggplot2_3.3.5 tidyverse_1.3.1 ComplexHeatmap_2.8.0 purrr_0.3.4
[31] biomaRt_2.48.3

loaded via a namespace (and not attached): [1] backports_1.4.1 circlize_0.4.14 BiocFileCache_2.0.0 systemfonts_1.0.4 igraph_1.2.11 lazyeval_0.2.2
[7] crosstalk_1.2.0 GenomeInfoDb_1.28.4 digest_0.6.29 foreach_1.5.2 htmltools_0.5.2 magick_2.7.3
[13] viridis_0.6.2 fansi_1.0.2 magrittr_2.0.1 memoise_2.0.1 cluster_2.1.2 doParallel_1.0.17
[19] tzdb_0.2.0 graphlayouts_0.8.0 Biostrings_2.60.2 modelr_0.1.8 matrixStats_0.61.0 vroom_1.5.7
[25] svglite_2.1.0 prettyunits_1.1.1 RVenn_1.1.0 colorspace_2.0-2 ggrepel_0.9.1 blob_1.2.2
[31] rvest_1.0.2 rappdirs_0.3.3 haven_2.4.3 xfun_0.29 crayon_1.4.2 RCurl_1.98-1.6
[37] jsonlite_1.7.3 graph_1.70.0 iterators_1.0.14 glue_1.6.1 polyclip_1.10-0 gtable_0.3.0
[43] zlibbioc_1.38.0 XVector_0.32.0 webshot_0.5.2 GetoptLong_1.0.5 car_3.0-12 Rgraphviz_2.36.0
[49] shape_1.4.6 abind_1.4-5 scales_1.1.1 DBI_1.1.2 rstatix_0.7.0 Rcpp_1.0.8
[55] viridisLite_0.4.0 progress_1.2.2 units_0.8-0 clue_0.3-60 proxy_0.4-26 bit_4.0.4
[61] DT_0.20 htmlwidgets_1.5.4 httr_1.4.2 ellipsis_0.3.2 farver_2.1.0 pkgconfig_2.0.3
[67] XML_3.99-0.8 sass_0.4.0 dbplyr_2.1.1 utf8_1.2.2 labeling_0.4.2 tidyselect_1.1.1
[73] rlang_1.0.1 munsell_0.5.0 cellranger_1.1.0 tools_4.1.2 cachem_1.0.6 cli_3.1.1
[79] generics_0.1.2 RSQLite_2.2.9 broom_0.7.12 evaluate_0.14 fastmap_1.1.0 yaml_2.2.2
[85] knitr_1.37 bit64_4.0.5 fs_1.5.2 tidygraph_1.2.0 ggraph_2.0.5 xml2_1.3.3
[91] compiler_4.1.2 rstudioapi_0.13 plotly_4.10.0 filelock_1.0.2 curl_4.3.2 png_0.1-7
[97] e1071_1.7-9 ggsignif_0.6.3 reprex_2.0.1 tweenr_1.0.2 bslib_0.3.1 stringi_1.7.5
[103] highr_0.9 classInt_0.4-3 vctrs_0.3.8 pillar_1.7.0 lifecycle_1.0.1 BiocManager_1.30.16
[109] jquerylib_0.1.4 GlobalOptions_0.1.2 data.table_1.14.2 bitops_1.0-7 R6_2.5.1 KernSmooth_2.23-20
[115] gridExtra_2.3 codetools_0.2-18 MASS_7.3-54 assertthat_0.2.1 rjson_0.2.21 withr_2.4.3
[121] GenomeInfoDbData_1.2.6 hms_1.1.1 class_7.3-19 rmarkdown_2.11 carData_3.0-5 Cairo_1.5-14
[127] sf_1.0-6 ggforce_0.3.3 lubridate_1.8.0

egeulgen commented 2 years ago

Hey @YunielFM,

Thanks for using the package and your kind words!

The terms are by default ranked by lowest_p (in increasing order), this is used by enrichment_chart() in the wrapper to plot the top 10 terms.

For each iteration (by default 10) of run_pathfindR(), active subnetwork search yields 100s of subnetworks (each of these are then used for enrichment analyses). support is the median proportion of active subnetworks leading to enrichment within an iteration over all iterations. Thus, a higher support would indicate a pathway/term that is supported by a larger proportion of subnetworks.

Hope this helps, Best, -E