Closed franrodalg closed 3 years ago
Hi Fran,
I don't know which organism you are dealing with, but in case this is a differentiated human sample I don't think that the CpG methylation is suspiciously high, but is seems to be within the realm of 'normal' average methylation levels.
If you have an average methylation of 82%, then naturally a lot of the density curve would be expected in the 75-100% part of the plot, and not very much in the low methylation part. I think the picture would look different if you would specifically look in lowly methylated regions such as CpG islands. Also, since your plot says "whole genome >10 reads" I would be worried that this depth filtering could potentially have some impact as well. Does the plot look similar if you don't apply depth filtering? The reason I am asking is that 110 million aligned reads are unlikely to be in the region of a a 10-fold coverage (1-4X seems more likely), so applying this kind of depth coverage might unduly select for regions that show a different methylation than the rest of the genome.
But again, since your overall CpG methylation is so high I don't think that coverage depth thresholding is responsible for much of effect here. Chances are that it is all as expected.
Hi Felix,
Thank you so much for your response! We are working on human blood (buffy coat).
And when read counts filter is removed, we got the following density curve:
And I have also tried to remove the peak at 100%me to have a closer look at the other peaks:
We have then however generated M-bias plot for this sample for further QC:
CpG R1:
CpG R2:
Does these look to you that we are having some problems with R2 reads and they need to be trimmed further?
Thanks a lot!
Cheers, Pui
Hi Pui,
That looks reassuring. The other peaks you are seeing seem to correspond to positions with a defined number of C's per position so 50% (1:1 methylated/unmethylated), 33% or 66% (1:2 or 2:1), 25% and 75% (1:3 or 3:1), again with a tendency towards higher methylation. Such 'peaks' would be filtered out with a 10 read threshold.
The M-bias plots look amazingly flat throughout at the same level in R1 and R2. The very last position of R2 always appears as 100% methylated, simply because the Illumina adapter starts with AGATC...
, meaning R2 may never end in A
(which means it may not be found unmethylated). This is consequence of adapter trimming, and is nothing you do about it really. If you look at the number of times this happens (just mouse over that position in the Bismark or MultiQC HTML report), the number is usually orders of magnitude lower (mostly because of the overlap detection and removal), and thus completely negligible.
I remain firm: your data looks great, go and get some nice results! :)
Thanks so so much Felix. I think they feel much more confident at the moment that the results are believable. Your insights are always invaluable.
Thanks a lot for your help Felix!
I'm glad it helped.
Hi @FelixKrueger !!
A while ago you recommended us a trimming method that worked wonders for our library prep. Some of our collaborators have followed the same procedure, and they get proper alignment rates and I think bisulfite conversion also looks good. Nevertheless, they feel the CpG methylation is suspiciously high:
but the most suspicious part is the CpG density plot, since there seems to be virtually no lowly methylated reads.
Does this look normal to you, or do you think there has been an issue with the library or its processing?
Cheers, Fran