FelixKrueger / Bismark

A tool to map bisulfite converted sequence reads and determine cytosine methylation states
http://felixkrueger.github.io/Bismark/
GNU General Public License v3.0
381 stars 101 forks source link

Interesting M-bias pattern #200

Closed caalo closed 6 years ago

caalo commented 6 years ago

Hi Felix,

Another question for you -- I'm looking at M-bias for my samples (8-10x WGBS) and I'm noticing that there is definitely a dropoff of methylation at the start of the read, but it goes back up continuously rather than a sharp increase, over 20-25bp. This looks somewhat different than your post illustrating M-bias. Have you seen anything like this before and do you have any recommendation on read clipping? I'm looking at read 2 here, but also included the read 1 plot.

fc19311298-01feb17sr_hsbs-wgbs_high markduplicates sorted m-bias_r2 fc19311298-01feb17sr_hsbs-wgbs_high markduplicates sorted m-bias_r1

Thanks, Chris

FelixKrueger commented 6 years ago

Wow this looks really weird, and more worryingly: it looks quite different from what R1 is doing. It is almost certain that combing Read 1 and Read in this case will introduce a LOT of variability almost independently of which kind of threshold you are setting. Which kind of kit was that? Maybe it would be worth getting back to the manufacturer and ask them what they think might be going on?

caalo commented 6 years ago

Hi Felix,

Upon a bit more digging, this phenomena is only specific to cell-free DNA (cfDNA): our gDNA samples do not exhibit this pattern. All of our cfDNA and gDNA samples were treated with Zymo EZ Methylation lightning kit. I also have been looking at cfDNA WGBS samples from other publications that used Qiagen Epitect kit, and they exhibit the same phenomena. Ours is labeled "in-house", whereas the other publication is labeled as "external":

screen shot 2018-09-10 at 2 30 40 pm

And here is our gDNA:

fc19269698-09jan2017sr_wgbs-wgbs_high markduplicates sorted m-bias_r2 fc19269698-09jan2017sr_wgbs-wgbs_high markduplicates sorted m-bias_r1

This observation seems to be specific to cfDNA regardless of kit used. However, I don't see any reason why R1 and R2 should have noticeably different methylation patterns, especially if there is 50/50 strand balance. Would be interested to hear what you think.

FelixKrueger commented 6 years ago

I just talked to Simon about this and he had an interesting idea (I am not sure how the kits work in detail but I'll try to explain it anyway). Basically, since R1 doesn't show this phenomenon (or only a slight drop towards the 3' end), could there be a directional degradation from the 3' end in cell free extract only? Is there some step in the cf method that reconstitutes partially degraded material at some stage (the red dotted line below)? If this fill-in would be performed with unmehtylated C it would explain why you are seeing this selectively towards the 3' ends of R1, or the start of R2. Or maybe something along those lines...

text2348

In practical terms this might then look like the standard R2 fill in bias at the start, just spread out over a longer stretch. I guess the options might the be clipping R2 by 25bp (or ignoring these positions), or ignoring R2 altogether as it will almost certainly add a lot of noise to your R1 data. I am not so sure about the 3' end methylation dip in R2, but looking at the total number of calls at these positions they won't make much of a difference (the calls are much lower because of overlap and adapter removal).