etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
545 stars 165 forks source link

CNVkit on WGBS data #809

Open QLZhouBio opened 1 year ago

QLZhouBio commented 1 year ago

Hi, I have some WGBS data and would like to check the copy number variant in these data. However, I am not sure whether CNVkit is suitable for WGBS data since the methylated C is converted to T in WGBS data. Will that affect the CG content hence affects the rolling median steps?

kenji-yt commented 4 months ago

@QLZhouBio I am pretty sure bisulfite conversion affects the relationship between GC content and coverage. This is because un-methylated regions will have very low GC content after conversion (1/2 original GC content if all C->T on both strands). I believe that reads will therefore be amplified by PCR according to their GC content post bisulfite conversion. Since PCR is the main source of GC bias (Speed and Benjamini 2012) it will affect coverage. For example in some of my own WGBS data there is no relationship between GC content and coverage. Normalization is therefore ineffective. The package ReadDepth accounts for this. They take the effective GC content of the bin by considering methylation status of each cytosine site.