YuLab-SMU / createKEGGdb

Create KEGG.db Package
57 stars 19 forks source link

Something wrong with get_path2name Function #11

Open Zechuan-Chen opened 1 year ago

Zechuan-Chen commented 1 year ago

Something wrong with download KEGG dataset. Here I correrted this part.

options(clusterProfiler.download.method = "wget")

enrichKEGG(de,pvalueCutoff=0.01,use_internal_data = F) --> No gene can be mapped.... --> Expected input gene ID: --> return NULL...

Sometimes, I found the error from enrichKEGG can't work correct. So I choose to build the KEGG.db. But...... createKEGGdb::create_kegg_db('hsa') Error in clusterProfiler:::kegg_list("pathway", species) : unused argument (species)

The argument "species" was unused. So I checked the cod and find somthing wrong in function "get_path2name" Here we add line3 and change "species" as "new_species"

get_path2name <- function(species){ if (length(species) == 1) { new_species=paste0("pathway/",species) keggpathid2name.df <- clusterProfiler:::kegg_list(new_species) } else { keggpathid2name.list <- vector("list", length(species)) names(keggpathid2name.list) <- species for (i in species) { keggpathid2name.list[[i]] <- clusterProfiler:::kegg_list("pathway", i)
} keggpathid2name.df <- do.call(rbind, keggpathid2name.list) rownames(keggpathid2name.df) <- NULL } keggpathid2name.df[,2] <- sub("\s-\s[a-zA-Z ]+\(\w+\)$", "", keggpathid2name.df[,2])

keggpathid2name.df[,1] %<>% gsub("path:map", "", .)

colnames(keggpathid2name.df) <- c("path_id","path_name") return(keggpathid2name.df) }

createKEGGdb::create_kegg_db('hsa') install.packages("./KEGG.db_1.0.tar.gz",repos=NULL,type="source")

ego_KEGG=enrichKEGG(gene=list$entrezgene, organism = "hsa", pvalueCutoff = 1, qvalueCutoff=1, minGSSize=1, use_internal_data = T)

Result-----------------------------------

ego_KEGG@result

           ID Description GeneRatio  BgRatio       pvalue     p.adjust       qvalue                                                    geneID

hsa05202 hsa05202 11/67 193/8292 3.366684e-07 6.093698e-05 4.961429e-05 1051/1649/3398/5966/4616/221037/1026/2120/5914/2521/51274 hsa04141 hsa04141 8/67 171/8292 6.426462e-05 5.815948e-03 4.735288e-03 3309/1649/7095/7184/9709/2923/468/5611 hsa03040 hsa03040 7/67 156/8292 2.460964e-04 1.233899e-02 1.004628e-02 10772/151903/6434/29896/25949/2521/6628

To fix the NA value-----------------------------------

keggpathid2name.df <- clusterProfiler:::kegg_list("pathway/hsa") ego_KEGG@result$Description<-strsplit(keggpathid2name.df$to[match(ego_KEGG@result$ID,keggpathid2name.df$from)], split = " - Homo sapiens (human)",fixed = T)

This is the whole problem and solution method.

fork16 commented 9 months ago

经过测试,在函数内的return之前把 表格ID 的物种名字删掉,比如这里 keggpathid2name.df$path_id 删掉 'hsa' 之后,再进行install 和 KEGG 分析则description就不会有NA了。