YuLab-SMU / DOSE

:mask: Disease Ontology Semantic and Enrichment analysis
https://yulab-smu.top/biomedical-knowledge-mining-book/
117 stars 36 forks source link

update of NCG environment? #8

Closed dalloliogm closed 9 years ago

dalloliogm commented 9 years ago

Hi Guahnchuang, I am trying to update the NCG environment, to align it with the next release of the database.

I am not sure I updated the environment correctly. The original NCG environment contains three variables: EXTID2PATHID, PATHID2EXTID, and PATHID2NAME. The last one is NULL, while the other two contain two lists, one entrez-based and the other pathway-based.

I used the code below to reproduce the environment, however I am not sure it works or how to debug it.

 library(plyr)
 library(dplyr)

 cancergenes = read.table('data/NCG5_cancergenes.txt', header=T, sep='\t')
 EXTID2PATHID.NCG = plyr::dlply(cancergenes, c("entrez"), function(x) {list(paste(unique(x$primary_site), sep=',', collapse=','))}) 
 EXTID2PATHID.NCG = EXTID2PATHID.NCG[1:length(EXTID2PATHID.NCG)]

 PATHID2EXTID.NCG = plyr::dlply(cancergenes, c("primary_site"), function(x) {paste(x$entrez) })
 PATHID2EXTID.NCG = PATHID2EXTID.NCG[1:length(PATHID2EXTID.NCG)]

 PATHID2NAME.NCG = NULL

 ncgenv = new.env()

 assign('PATHID2NAME', PATHID2NAME.NCG, envir=ncgenv)
 assign('PATHID2EXTID', PATHID2EXTID.NCG, envir=ncgenv)
 assign('EXTID2PATHID', EXTID2PATHID.NCG, envir=ncgenv)

 save(ncgenv, file='data/NCG_DOSE_Env.rda')```
GuangchuangYu commented 9 years ago

I read the file, it's not exactly in the format as previous one.

Another issue is

save(ncgenv, file='data/NCG_DOSE_Env.rda')

If I load NCG_DOSE_Env.rda, I get the environment, ncgenv. But in enrichNCG function, it try to load an environment call NCG_DOSE_Env.

So the last command should change to:

NCG_DOSE_Env <- ncgenv
save(NCG_DOSE_Env, file='data/NCG_DOSE_Env.rda')

You can use the script, https://github.com/GuangchuangYu/DOSE/blob/master/inst/extdata/build_NCG_Anno.R, to build the NCG_DOSE_Env. Please also modify it to meet the new version.

PS: the build_Anno function is located at https://github.com/GuangchuangYu/clusterProfiler/blob/master/R/utilities.R

dalloliogm commented 9 years ago

thanks, I've updated the file using the build_NCG_Anno script. A couple of questions on this script:

GuangchuangYu commented 9 years ago

Yes, it's build_Anno. Thanks for the correction.

path2name is used to convert the path (ID column) to name (Description column). If it's NULL, the Description column will be filled with ID. It's OK to be NULL.

See TERM2NAME.USER_DEFINED.internal .

GuangchuangYu commented 9 years ago

The enrichNCG is the underlying implementation of using user input file in clusterProfiler