al2na / methylKit

R package for DNA methylation analysis
https://bioconductor.org/packages/release/bioc/html/methylKit.html
210 stars 96 forks source link

How to merge the DMR within 100bp or other size? #216

Closed yplee614 closed 1 year ago

yplee614 commented 3 years ago

How to merge the DMR within 100bp or other size?

alexg9010 commented 3 years ago

Hi @yplee614,

there is no function to merge dmcs to dmrs within a certain range, but you could look at the function described here: https://github.com/al2na/methylKit/issues/193 , which clusters dmcs into dmrs by the same unsupervised segmentation algorithm we use for methSeg().

Best, Alex

yplee614 commented 3 years ago

Hi Alex,

Could you add this function into the methykit package? Many thanks.

Best, yplee

yplee614 commented 3 years ago

I mean to merge two DMRs into one DMR, when they are with 100bp in genome. No merge dmcs to dmrs. Thank you.

alexg9010 commented 3 years ago

Ah Okay,

there is also no function to merge dmrs within a certain range, but I guess one could do this with findOverlaps() and some checks to only merge dmrs with similar meth.diff .

@katwre do you maybe already have a function to do this?

katwre commented 3 years ago

hi @yplee614 and @alexg9010 , Sorry, I don't have a function for that. What I would do is:

Hope it helps, Kasia

Biomamba commented 1 year ago

hi @yplee614 and @alexg9010 , Sorry, I don't have a function for that. What I would do is:

  • convert your methylDiff object to GRanges your.granges.object=as(yourmethdiffobject, "GRanges")
  • use reduce function with min.gapwidth=100 to merge your DMRs that are in 100 bp proximity (my.red=reduce(your.granges.object, min.gapwidth=100))
  • if you'd like to aggregate average methylation differences (meth.diff column) or other meta columns that you are interested in from your DMRs that got merged in my.red, first use findOverlaps(your.granges.object, my.red) and then stats::aggregate function (or any other function to aggregate a column by a group from e.g. dplyr or data.table R packages)

Hope it helps, Kasia

Thank you for your great contribution. I have tried the method you metioned above. However, I got the my.red without the information of differential methylation which is contained in your.granges.object. Could you please tell me how to solve it. Thank you very much!

alexg9010 commented 1 year ago

Hi @Biomamba,

Kasia's third point outlines how to get this done:

overlaps <- findOverlaps(your.granges.object, my.red)
signal <- your.granges.object[overlaps@queryHits]
my.red$mean.meth.diff <- aggregate(your.granges.object$meth.diff, list(overlaps@subjectHits), mean)

Best, Alex

johanzi commented 1 year ago

If this help, I created a R function that merged overlapping DMRs and average methyl.diff values to return a tibble in bed format. You can also tweak it to get a GRanges object at the end (just add the methyl.diff value in a metacolumn).

merge_DMRs <- function(methylDiff_object){
  require("valr")
  require("tidyr")
  require("GenomicRanges")

  # Convert methylDiff into GRange
  gr <- as(methylDiff_object, "GRanges")

  # Convert GR to dataframe in bed format
  # Rename seqnames to chrom and convert to char otherwise error from valr  
  # Note Grange is 1-based and bed is 0-based so one needs to subtract 1 to start position
  df <- data.frame(chrom=as.character(seqnames(gr)),
                   start=as.integer(start(gr)-1),
                   end=as.integer(end(gr)),
                   name=c(rep(".", length(gr))),
                   score=as.double(gr$meth.diff))

  # Merge first DMRs and keep average methyl.diff value.
  df_merged <- df %>% valr::bed_merge(score=mean(score)) 

  # Now add name to each DMR
  len <- dim(df_merged)[[1]]
  name_DMRs <- paste("DMR", seq(1, len, 1), sep="")

  df_merged$name <- name_DMRs

  # Reorder columns
  df_merged_ordered <-   df_merged %>% 
    dplyr::select(chrom, start, end, name, score)

  return(df_merged_ordered)
}