Open tjparnell opened 1 year ago
@tjparnell Did you ever get to the core of this problem? Do you think it has been fixed? I'm having another issue with ChIPseeker and am now wondering how much I can trust the package overall given open issues like this.
@ChristophH I finally tested release 1.38.0 on our server, and the test code above now appears to give correct annotation. I have yet to test it fully. I have not seen any substantial commits here beyond documentation and version number changes, which makes me wonder if the problem was due to upstream code. Nevertheless, given the general absence of responses and concern to most queries here, my enthusiasm is still lacking.
The reported annotation for peaks that fall in introns (and exons) are frequently reported as being in the adjacent neighboring gene, suggesting an off-by-1 indexing error when reporting the identifiers in the
geneId
andtranscriptId
columns. The identifiers are correct in theannotation
column.Reproducible code with version 1.34.1:
Result
If you refer to peak in line 4, it is listed as being in gene
NNMT
. However, if you load the peak file in a genome browser, you will note it is actually in an intron ofZBTB16
. GeneNNMT
is actually immediately to the right ("downstream") ofZBTB16
. The transcript/gene identifiers in theannotation
column do not match those in thetranscriptId
andgeneId
columns. The IDs in theannotation
column are actually correct. Finally, the peakend
coordinate of114050234
is less than thegeneStart
coordinate of114128553
, a logical failure.The same behavior is also observed for the peak in line 6: it is listed as gene
ANKRD20A12P
, but in actuality is left of the gene.This behavior is also seen in version 1.28.3 with Bioconductor 3.13 under R4.1. I have also seen the issue when using my own annotation imported from a GTF file.
I suspect this may also be related to #158, #166, #210, and #212.