egeulgen / pathfindR

pathfindR: Enrichment Analysis Utilizing Active Subnetworks
https://egeulgen.github.io/pathfindR/
Other
178 stars 25 forks source link

Duplicated kegg term descriptions cased plot_scores error. #87

Closed zhuhenan closed 3 years ago

zhuhenan commented 3 years ago

Hi,

First of all, thank you so much for developing this tool. It is very helpful for my project. But also, I found a potential issue that will stop the plot_scores function. This is because the KEGG has duplicated term descriptions but using different KEGG IDs. Here is an example:

hsa04210 - Apoptosis - Homo sapiens (human) hsa04215 - Apoptosis - multiple species - Homo sapiens (human)

Then the score matric will have records shown as:

                Veh-A          Veh-B          Veh-C

Apoptosis -0.3600243 -0.3819357 -0.4138681 Apoptosis -0.3409219 -0.2428533 -0.4071128

Then plot_scores function will throw an error at step: var_names[["Term"]] <- factor(rownames(score_matrix), levels = rev(rownames(score_matrix))) Error in levels<-(*tmp*, value = as.character(levels)) : factor level [20] is duplicated

Best, Henan

egeulgen commented 3 years ago

Hey @zhuhenan, Thanks for pointing this out! I've implemented 2 fixes for this issue: 1) score_terms() now appends the ID of the term if the term description is duplicated 2) I've updated the KEGG data (both human and mouse) in pathfindR.data so that no term description is duplicated

To re-perform your analysis with these fixes, please update both pathfindR and pathfindR.data to their latest development versions:

install.packages("devtools") # if you have not installed "devtools" 
devtools::install_github("egeulgen/pathfindR")
devtools::install_github("egeulgen/pathfindR.data")