YuLab-SMU / ChIPseeker

:dart: ChIP peak Annotation, Comparison and Visualization
https://onlinelibrary.wiley.com/share/author/GYJGUBYCTRMYJFN2JFZZ?target=10.1002/cpz1.585
220 stars 74 forks source link

Problems with ChIPseeker annotation in mm10 #158

Open iramai opened 3 years ago

iramai commented 3 years ago

Hi again! I am going to explain again my problem with more additional information, because in the previous issue I think we did not undestand each other, and you closed the issue before I could reply. As I told you in the previous issue, I have been using ChIPseeker for some sequencing experiments, for the annotation. But I have identified some kind of errors with the annotataion of the coordinates. I have followed the tutorial from bioconductor: (http://www.bioconductor.org/packages/release/bioc/vignettes/ChIPseeker/inst/doc/ChIPseeker.html), And used the TxDb object for that steps for annotation. The thing is that sometimes the tool identifies some gene features far away from the gene position.

First I added my file (Annotation_pval._f.txt) to the Chipseeker folder (GEO_sample_data), with the aim of using the same commands you use in the protocol. And then I followed all your protocol. It is important to mention that the Annotation_pval_f.txt file (the file that I want to be annotated), is the result of experimentation with mESCs, and that is why I use the mm10 annotation file for the pipeline (txdb). This are the followed commands:

library(GenomicRanges) library(rtracklayer) library(dplyr) library(ChIPpeakAnno) library(ChIPseeker) library(org.Mm.eg.db) library(TxDb.Mmusculus.UCSC.mm10.knownGene) txdb <- TxDb.Mmusculus.UCSC.mm10.knownGene files<-getSampleFiles() files $Annotation_pval_f.txt [1] "C:/Users/Documents/R/win-library/4.0/ChIPseeker/extdata/GEO_sample_data/Annotation_pval_f.txt" $ARmo_0M [1] "C:/Users/Documents/R/win-library/4.0/ChIPseeker/extdata/GEO_sample_data/GSM1174480_ARmo_0M_peaks.bed.gz" $ARmo_1nM [1] "C:/Users/Documents/R/win-library/4.0/ChIPseeker/extdata/GEO_sample_data/GSM1174481_ARmo_1nM_peaks.bed.gz" $ARmo_100nM [1] "C:/Users/Documents/R/win-library/4.0/ChIPseeker/extdata/GEO_sample_data/GSM1174482_ARmo_100nM_peaks.bed.gz" $CBX6_BF [1] "C:/Users/Documents/R/win-library/4.0/ChIPseeker/extdata/GEO_sample_data/GSM1295076_CBX6_BF_ChipSeq_mergedReps_peaks.bed.gz" $CBX7_BF [1] "C:/Users/Documents/R/win-library/4.0/ChIPseeker/extdata/GEO_sample_data/GSM1295077_CBX7_BF_ChipSeq_mergedReps_peaks.bed.gz"

peakAnnoBatch<-annotatePeak(files[[1]], tssRegion=c(-3000, 3000), TxDb=txdb, annoDb="org.Mm.eg.db") loading peak file... 2021-04-09 0:02:31 preparing features information... 2021-04-09 0:02:34 identifying nearest features... 2021-04-09 0:02:35 calculating distance from peak to TSS... 2021-04-09 0:02:41 assigning genomic annotation... 2021-04-09 0:02:41 adding gene annotation... 2021-04-09 0:03:05 'select()' returned 1:many mapping between keys and columns assigning chromosome lengths 2021-04-09 0:03:06 done... 2021-04-09 0:03:06

But when analyzing the output file I have found some incongruities. Some genes are annotated out of the correct regions. For example: chr10 | 13203078 | 13203079 | Ltv1 | Distal intergenic chr10 | 13210280 | 132102811 | Ltv1 | 3'UTR chr10 | 13213953 | 13213954 | Ltv1 | 3'UTR chr10 | 13224394 | 13224395 | Ltv1 | Intron (ENSMUST00000105545.11/215789, intron 12 of 12) chr10 | 13229471 | 13229472 | Ltv1 | Intron (ENSMUST00000105545.11/215789, intron 11 of 12) chr10 | 13236038 | 13236039 | Ltv1 | Intron (ENSMUST00000105545.11/215789, intron 9 of 12)

When de Ltv1 gene coordinates are chr10: 13178140-13193168, that is out of the regions detected on the ChIPseeker tool. In fact those coordinates belong to the gene Phactr2 (chr10: 13213395-13324289), and the previously annoted ensembl codes belong to this second gene, not to the Ltv1.

Can you help me solving this issue? I don't understand why the tool is nos detecting properly the gene intersects or if I am doing something wrong.

Thanks in advance,

Iraia