GangLiLab / genekitr

🧬 Gene analysis toolkit based on R
https://www.genekitr.fun
GNU General Public License v3.0
53 stars 7 forks source link

ORA result plotting error because of duplicated terms #19

Closed aj-kozik closed 1 year ago

aj-kozik commented 1 year ago

Describe the bug I am trying to create bar plots of my ORA results but keep getting an error in dyplr::mutate()

To Reproduce Steps to reproduce the behavior: using attached testfile 'testgenelist.csv', the following code should reproduce the error

library(genekitr)
library(geneset)
gs3 <- getReactome(org = "human")
testgenes <- read.csv(file = "data/testgenelist.csv", header = TRUE, sep = ",")
## ORA Analysis
id <- testgenes$GeneID
test_ego <- genORA(id,
                        geneset = gs3,
                        p_cutoff = 0.05,
                        q_cutoff = 0.10
)

#plot
plotEnrich(test_ego, plot_type = "bar")
  1. See error The following error was raised (screenshot included):

    plotEnrich(test_ego, plot_type = "bar") Error in dplyr::mutate(): ℹ In argument: Description = factor(.$Description, levels = .$Description, ordered = T). Caused by error in levels<-: ! factor level [20] is duplicated Run rlang::last_trace() to see where the error occurred.

When rlang last trace is run: Error in dplyr::mutate(): ℹ In argument: Description = factor(.$Description, levels = .$Description, ordered = T). Caused by error in levels<-: ! factor level [20] is duplicated

Backtrace: ▆

  1. ├─genekitr::plotEnrich(test_ego, plot_type = "bar")
  2. │ └─... %>% ...
  3. ├─dplyr::mutate(...)
  4. ├─dplyr:::mutate.data.frame(...)
  5. │ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
  6. │ ├─base::withCallingHandlers(...)
  7. │ └─dplyr:::mutate_col(dots[[i]], data, mask, new_columns)
  8. │ └─mask$eval_all_mutate(quo)
  9. │ └─dplyr (local) eval()
    1. ├─base::factor(.$Description, levels = .$Description, ordered = T)
    2. └─base::.handleSimpleError(...)
    3. └─dplyr (local) h(simpleError(msg, call))
    4. └─rlang::abort(message, class = error_class, parent = parent, call = error_call)

      Expected behavior

I expected the barplot to be generated as normal. I haven't had this issue with any other datasets I have analyzed. Inspection of the test_ego result doesn't seem to be impacted either. Dataframe of ORA result (test_ego) screenshot included.

Screenshots testgenelist.csv

image image

Desktop (please complete the following information):

Additional context

reedliu commented 1 year ago

Hi, this issue is due to the duplicated terms in test_ego.

library(genekitr)
library(geneset)
library(dplyr)
gs3 <- getReactome(org = "human")
testgenes <- read.csv(file = "~/Downloads/for_R_test/testgenelist.csv", header = TRUE, sep = ",")
## ORA Analysis
id <- testgenes$GeneID
test_ego <- genORA(id,
                   geneset = gs3,
                   p_cutoff = 0.05,
                   q_cutoff = 0.10
)

# check duplicated term
description <- test_ego$Description
dup_term <- description[duplicated(description)] # "Maturation of nucleoprotein"

# get IDs
dup_term_ids <- test_ego %>% filter(Description == dup_term) %>% pull(ID)
# "R-HSA-9683610" "R-HSA-9694631"

You can see both R-HSA-9694631 and R-HSA-9683610 all belong to "Maturation of nucleoprotein". While the Y-axis of the plot does not support duplicated terms.

So the solution is to make the duplicated terms unique. For example, we could combine Reactome IDs with "Maturation of nucleoprotein".

# modify duplicated description
test_ego <- test_ego %>% mutate(Description = if_else(Description %in% dup_term,
                                          paste0(Description,'_',ID),
                                          Description))
plotEnrich(test_ego, plot_type = "bar")

image

Anyway, I have updated the function in v1.2.3, which could automatically check if duplicated terms exist and give users warning instead of error:

image