ben-laufer / DMRichR

A R package and executable for the preprocessing, statistical analysis, and downstream testing and visualization of differentially methylated regions (DMRs) from CpG count matrices (Bismark cytosine reports)
https://www.benlaufer.com/DMRichR/
MIT License
39 stars 22 forks source link

Confused about DMR annotation #80

Closed f6v closed 11 months ago

f6v commented 11 months ago

Hi,

One of the DMRs we got is located at chr17:8243028-8243946. In the excel file, it has Open.Sea as "yes", annotation is equal to "Promoter" and distanceToTSS is "0". But the DMRs plot looks like:

image

It seems like this DMR is within the intron and far from TSS if we look at genome browser, this region seems to be outside of promoter:

image

Am I reading this wrong? How can distanceToTSS be 0 here? If it was 0, shouldn't the first plot contain the first exon?

ben-laufer commented 11 months ago

Hi @f6v,

The following can reproduce your example:

library(DMRichR)

DMRichR::annotationDatabases("mm10")

sigRegions <- data.frame(seqnames = "chr17",
           start = 8243028,
           end = 8243946) %>% 
  plyranges::as_granges()

sigRegions %>% 
  DMRichR::annotateRegions(TxDb = TxDb,
                           annoDb = annoDb)

DMRichR::annotateRegions() is calling to ChIPseeker::annotatePeak(). You can learn more about the annotation by running the following (after the first block of code):

 sigRegions %>% 
    ChIPseeker::annotatePeak(TxDb = TxDb,
                             annoDb = annoDb,
                             overlap = "all",
                             verbose = FALSE) %>%
    dplyr::as_tibble()

This shows that the transcript mapping is based on ENSMUST00000166348.9 , which it says starts at 8243811, so your DMR does overlap with its definition of the promoter. See here for more information.

For the DMR plot, the function calls dmrseq::getAnnot(), which uses annotatr::build_annotations().

annoTrack <- dmrseq::getAnnot(genome)

annoTrack$Exons %>% 
  plyranges::filter(tx_id == "ENSMUST00000166348.9")

This shows that the coding sequence is from: 8255965-8256108. Since the plot only shows the CDS, and not promoters, your result seems correct. You can learn about the annotations here.

If you'd like for the plot to look different, you can customize the tracks at the bottom of DMRichR::plotDMRs2(), which is based on dmrseq::plotDMRs(), through the annoTrack argument.