ShengLi / edmr

Optimized DMR analysis based on bimodal normal distribution model and cost function for regional methylation analysis.
16 stars 21 forks source link

edmr does not display neither the correct number of CpGs inside region nor the correct methylation status (hyper/hypo) #8

Open TYRANISTAR opened 3 years ago

TYRANISTAR commented 3 years ago

Hello, I am using edmr right after MethylKit but the DMRs I get, correspond to a region where there are more CpGs than displayed in "num.CpGs" column. The code I run is the following:

dmp_rrbs <- calculateDiffMeth(methobj) #plus some extra arguments
myDiff <- getData(dmp_rrbs)
dmrs <- edmr(myDiff, mode=2)
dmrs_data <- data.frame(dmrs)

From the first few row of the table I get this:

seqnames     start       end width strand mean.meth.diff num.CpGs num.DMCs
1    chr11  18659110  18659225   116      *       21.96973        4        1
2    chr16  84941109  84941696   588      *       27.48481        7        4
3    chr21  45417327  45417691   365      *       21.00500        3        1

but when I check the original dmp_rrbs object for the region in chromosome 11 I get more CpGs:

chr    start      end strand       pvalue       qvalue   meth.diff
1048864 chr11 18659110 18659110      * 7.946860e-01 9.749628e-01  0.08623922
1048865 chr11 18659111 18659111      * 7.031369e-04 6.705592e-02  6.79611650
1048866 chr11 18659133 18659133      * 5.366955e-01 9.025498e-01 -1.72103487
1048867 chr11 18659134 18659134      * 6.658160e-01 9.433952e-01  0.82227066
1048868 chr11 18659135 18659135      * 3.440149e-01 8.120775e-01  2.93494563
1048869 chr11 18659136 18659136      * 6.894213e-01 9.496794e-01 -1.05508223
1048870 chr11 18659156 18659156      * 7.604073e-01 9.673050e-01 -1.36898355
1048871 chr11 18659157 18659157      * 5.058999e-01 8.909928e-01  0.50603119
1048872 chr11 18659199 18659199      * 9.590265e-01 1.000000e+00  0.42335411
1048873 chr11 18659200 18659200      * 2.459876e-03 1.216149e-01  9.16385972
1048874 chr11 18659211 18659211      * 7.247924e-04 6.804295e-02 56.07736533
1048875 chr11 18659212 18659212      * 6.300830e-01 9.331640e-01 19.87319200
1048876 chr11 18659217 18659217      * 4.286156e-01 8.574166e-01  1.47169104
1048877 chr11 18659218 18659218      * 1.617051e-01 6.547773e-01 13.41207086
1048878 chr11 18659224 18659224      * 2.414317e-01 7.379108e-01  6.03896104
1048879 chr11 18659225 18659225      * 1.377561e-15 5.878009e-11 15.84158416

and they are not the same type of differentially methylated (hyper or hypo) as I had requested with the "type=2" argument. Shouldn't there be only 4 CpGs one of which being differentially methylated in that region (i.e. the meth.diff should be either negative or positive for all)?

Can you help me out please? Am I missing something or is it a bug of the tool? I am using R 3.6.0 and hg38 genome for RRBS samples.

Thank you in advance

ShengLi commented 2 years ago

The mode = 2 means you are using the CpGs with methylation change in the same direction to call DMRs, which take the last column “meth.diff” value into account.

The reason you see fewer CpGs in the DMRs output is because:

  1. eDMR.sub() by default applies fuzzypval = 0.1, which means that only CpG sites with pvalue <0.1 were considered for DMR calling.
    1. Mode = 2 considers CpGs with meth.diff > 0 and meth.diff < 0 separately.

I updated edmr code to pass the fuzzypval value to the eDMR.sub() so that your analysis will not filter CpGs by pvalue < 0.1. Could you give a try again? Thanks!

TYRANISTAR commented 2 years ago

Hello,

I updated the edmr package and confirmed that both hyper.myDMR() and hypo.myDMR() variables have the fuzzypval = fuzzypval extra argument. Nevertheless the problem remains. For example when I run:

dmp_rrbs <- calculateDiffMeth(methobj) #plus some extra arguments
myDiff <- getData(dmp_rrbs)
dmrs <- edmr(myDiff, mode=2)
dmrs_data <- data.frame(dmrs)
head(dmrs_data)

I get the following:

  seqnames   start     end width strand mean.meth.diff num.CpGs num.DMCs
1     chr1 1117516 1117655   140      *       33.60520       14        8
2     chr1 1736319 1736620   302      *       26.74212       11        7
3     chr1 1831467 1831497    31      *       32.87444        5        3
4     chr1 1901833 1902126   294      *       20.58521        8        4
5     chr1 2138975 2139281   307      *       26.95013        8        4
6     chr1 2488928 2488985    58      *       27.96966        6        4

Then if I search for the 3rd dmr:

keep <- which(dmp_rrbs$chr=="chr1" & dmp_rrbs$start>=1831467 & dmp_rrbs$end<=1831497)
data.frame(dmp_rrbs[keep,])

I get the following:


--------------
       chr   start     end strand       pvalue       qvalue  meth.diff
20484 chr1 1831468 1831468      + 3.861869e-03 3.173452e-02   0.502854
20485 chr1 1831485 1831485      + 9.891393e-06 1.303714e-04 -36.076610
20486 chr1 1831486 1831486      - 1.015186e-24 8.679493e-23  57.931034
20487 chr1 1831490 1831490      + 2.930524e-22 2.024816e-20  15.770796
20488 chr1 1831491 1831491      - 1.412065e-04 1.559009e-03  25.378788
20489 chr1 1831496 1831496      + 1.242588e-13 3.871686e-12 -28.672427
20490 chr1 1831497 1831497      - 4.253546e-26 4.080455e-24  64.788732

As you see I still get more CpGs and they are still not the same type of differentially methylated (hyper or hypo) as I had requested with the "mode=2" argument.

What am I missing here?

Thank you in advance.