egeulgen / pathfindR

pathfindR: Enrichment Analysis Utilizing Active Subnetworks
https://egeulgen.github.io/pathfindR/
Other
178 stars 25 forks source link

get_gene_sets_list returns empty list with "zmo" #72

Closed ryandward closed 3 years ago

ryandward commented 3 years ago

Describe the bug Retrieving "zmo" KEGG gene set results in empty list.

Digging into the code, I see that get_gene_sets_list depends on get_kegg_gsets, which has the following code:

get_kegg_gsets <- function(org_code = "hsa") {
  # created named list, eg:  path:map00010: "Glycolysis / Gluconeogenesis"
  pathways_list <- KEGGREST::keggList("pathway", org_code)

  # make them into KEGG-style pathway identifiers
  pathway_codes <- sub("path:", "", names(pathways_list))

  # parse pathway genes
  genes_by_pathway <- lapply(pathway_codes, function(pwid) {
    pw <- KEGGREST::keggGet(pwid)
    pw <- pw[[1]]$GENE[c(FALSE, TRUE)] ## <--------BUG IS HERE
    pw <- sub(";.+", "", pw) ## discard any description
    pw <- pw[grep("^[A-Za-z0-9_-]+(\\@)?$", pw)] ## remove mistaken lines
    pw <- unique(pw) ## keep unique genes
    pw
  })

  names(genes_by_pathway) <- pathway_codes

  # remove empty gene sets (metabolic pathways)
  kegg_genes <- genes_by_pathway[vapply(genes_by_pathway, length, 1) != 0]

  kegg_descriptions <- pathways_list
  names(kegg_descriptions) <- sub("path:", "", names(kegg_descriptions))
  kegg_descriptions <- sub(" -.*\\(.*\\)$", "", kegg_descriptions)
  kegg_descriptions <- kegg_descriptions[names(kegg_descriptions) %in% names(kegg_genes)]

  result <- list(gene_sets = kegg_genes, descriptions = kegg_descriptions)
  return(result)
}

To Reproduce Steps to reproduce the behavior: get_gene_sets_list("zmo")

Expected behavior I expect some output, instead I get nothing.

Desktop (please complete the following information): Arch Linux

To fix it

In the block of code here, just reverse the c(FALSE,TRUE) with c(TRUE,FALSE)

  # parse pathway genes
  genes_by_pathway <- lapply(pathway_codes, function(pwid) {
    pw <- KEGGREST::keggGet(pwid)
    pw <- pw[[1]]$GENE[c(FALSE, TRUE)] ## <-------- change to c(TRUE,FALSE)
    pw <- sub(";.+", "", pw) ## discard any description
    pw <- pw[grep("^[A-Za-z0-9_-]+(\\@)?$", pw)] ## remove mistaken lines
    pw <- unique(pw) ## keep unique genes
    pw
  })
egeulgen commented 3 years ago

hey @ryandward, thank you for raising this issue. I implemented a more generalized fix, should be working fine for all organisms now. You can install the latest dev version (including this fix) via:

devtools::install_github("egeulgen/pathfindR")
ryandward commented 3 years ago

Thanks it's working now!