YuLab-SMU / ChIPseeker

:dart: ChIP peak Annotation, Comparison and Visualization
https://onlinelibrary.wiley.com/share/author/GYJGUBYCTRMYJFN2JFZZ?target=10.1002/cpz1.585
219 stars 74 forks source link

support EnsDb #193

Closed GuangchuangYu closed 1 year ago

GuangchuangYu commented 2 years ago

see https://support.bioconductor.org/p/124900/.

We should change to using ensemble ID as default when passing an EnsDb to the TxDb parameter.

MingLi-929 commented 2 years ago

i am a little bit confuesd about this statement

We should change to using ensemble ID as default when passing an EnsDb to the TxDb parameter.

but i carefully looked into this bug report

see https://support.bioconductor.org/p/124900/.

This bug came from the columns items. https://github.com/YuLab-SMU/ChIPseeker/blob/ff7bffd10514c1428e3553c4dee224967092ac6d/R/addGeneAnno.R#L28-L33 ChIPseeker used above codes to capture annotation from annoDb parameter. These code was appied to any database. But different database have different column items

library(EnsDb.Hsapiens.v86)
library(org.Hs.eg.db)

db <- EnsDb.Hsapiens.v86
columns(db)

> columns(db)
 [1] "ENTREZID"            "EXONID"             
 [3] "EXONIDX"             "EXONSEQEND"         
 [5] "EXONSEQSTART"        "GENEBIOTYPE"        
 [7] "GENEID"              "GENENAME"           
 [9] "GENESEQEND"          "GENESEQSTART"       
[11] "INTERPROACCESSION"   "ISCIRCULAR"         
[13] "PROTDOMEND"          "PROTDOMSTART"       
[15] "PROTEINDOMAINID"     "PROTEINDOMAINSOURCE"
[17] "PROTEINID"           "PROTEINSEQUENCE"    
[19] "SEQCOORDSYSTEM"      "SEQLENGTH"          
[21] "SEQNAME"             "SEQSTRAND"          
[23] "SYMBOL"              "TXBIOTYPE"          
[25] "TXCDSSEQEND"         "TXCDSSEQSTART"      
[27] "TXID"                "TXNAME"             
[29] "TXSEQEND"            "TXSEQSTART"         
[31] "UNIPROTDB"           "UNIPROTID"          
[33] "UNIPROTMAPPINGTYPE" 

db <- org.Hs.eg.db
columns(db)

> columns(db)
 [1] "ACCNUM"       "ALIAS"        "ENSEMBL"      "ENSEMBLPROT" 
 [5] "ENSEMBLTRANS" "ENTREZID"     "ENZYME"       "EVIDENCE"    
 [9] "EVIDENCEALL"  "GENENAME"     "GENETYPE"     "GO"          
[13] "GOALL"        "IPI"          "MAP"          "OMIM"        
[17] "ONTOLOGY"     "ONTOLOGYALL"  "PATH"         "PFAM"        
[21] "PMID"         "PROSITE"      "REFSEQ"       "SYMBOL"      
[25] "UCSCKG"       "UNIPROT"     

So when we pass in EnsDb.Hsapiens.v86 database, ChIPseeker can not capture ENSEMBL id through columns=c("ENTREZID", "ENSEMBL", "SYMBOL", "GENENAME").

We should change it into columns=c("ENTREZID", "GENEID", "SYMBOL", "GENENAME") and then can capture the ENSEMBL id.

Although i am a little confused about the instruction, my plan to fix this bug is to add a parameter call columns to let users to specify columns to capture from annoDb parameter.