YuLab-SMU / ProjectYulab

:next_track_button: Small coding tasks that enable you to participate in our development
33 stars 3 forks source link

add sub-category information of the KEGG pathways #5

Open GuangchuangYu opened 1 year ago

GuangchuangYu commented 1 year ago

KEGG can be divided into 7 categories, see https://www.genome.jp/kegg/pathway.html.

It is easy to incorporate this information in the enrichKEGG() and gseaKEGG() results, so that we can use this information to filter the results or to differentiate the pathways for visualization.

reference: https://mp.weixin.qq.com/s/17ujVhcrkX1DLsUJBtUGEw.

Potato-tudou commented 1 year ago

A function embedding KEGG api (https://rest.kegg.jp/get/) could be used to get the classification info of a KEGG term with the given ID. However, the purpose of this function is to keep the classification info updated, it is effective, but not so high-efficient. Still, handle things locally is more elegant.

library(pacman)
pacman::p_load(httr, jsonlite, magrittr)

class_t <- function(term) {
    res <- paste( "https://rest.kegg.jp/get/",
                  term, sep = '') %>% GET()
    res_info <- res$content %>% rawToChar()
    class_res <- gsub('^.*CLASS\\s*|\\s*PATHWAY_MAP.*$', '', res_info)
    return(class_res)
  }
class_t("mmu04380")
[1] "Organismal Systems; Development and regeneration"

#Then the user can easily get all the classification of KEGG terms in enrichKEGG result by: 
lapply(kegg_res@result$ID, class_t)
Potato-tudou commented 1 year ago

Now the category of a certain kegg term can be extracted by the help of the referred url ("https://pathview.uncc.edu/data/khier.tsv").

''' k.info <- read.table("https://pathview.uncc.edu/data/khier.tsv", header = T) %>% separate(pathway, c("ID","Description"), extra = "merge",fill = "right")

getKEGG_cat <- function(ID, k_info) { cleanID <- function (id_num) { gsub("[a-z]", "", id_num) } inputID <- cleanID(ID) k_info[(k_info$ID == inputID),]$category }

getKEGG_cat(ID = "mmu04380", k_info = k.info) [1] "Organismal Systems"

'''

GuangchuangYu commented 1 year ago

@Potato-tudou pls learn how to format your code first.

Refer to point 2 mentioned by Yonghe, https://github.com/YuLab-SMU/ProjectYulab/issues/1#issuecomment-1545083474.

Potato-tudou commented 1 year ago
k.info <- read.table("https://pathview.uncc.edu/data/khier.tsv", header = T) %>%
separate(pathway, c("ID","Description"), extra = "merge",fill = "right")

getKEGG_cat <- function(ID, k_info) {
cleanID <- function (id_num) {
gsub("[a-z]", "", id_num)
}
inputID <- cleanID(ID)
k_info[(k_info$ID == inputID),]$category
}

getKEGG_cat(ID = "mmu04380", k_info = k.info)
Potato-tudou commented 1 year ago

I think it's better to use the result of enrichKEGG() as an input. So here it is:

listKEGG_cat <- function (enrich_res) {
  k.info <- read.table("https://pathview.uncc.edu/data/khier.tsv", header = T) %>% 
    separate(pathway, c("ID","Description"), extra = "merge",fill = "right")
  getKEGG_cat <- function(ID, k_info) {
    cleanID <- function (id_num) {
      gsub("[a-z]", "", id_num)
    }
    inputID <- cleanID(ID)
    k_info[(k_info$ID == inputID),]$category
  }
  lapply(enrich_res@result$ID,getKEGG_cat, k_info = k.info) %>% unlist()
}
GuangchuangYu commented 1 year ago

see also https://github.com/YuLab-SMU/clusterProfiler/issues/236.