Closed yupenghe closed 6 years ago
Hi @yupenghe Yupeng, thanks for your comment. Actrually, the FilterReads.py script is an optional one, which were added on the request of some users of BS-Seeker2. I have added warning message in the script, and README file.
For WGBS, it is still useful to remove reads from PCR effect; and "cgmaptools rmdup" also works in similiar way. For RRBS, it is not suggested to use this script; as more reads from the same CCGG----CCGG fragment would have similar starts and ends.
Thanks for your message. And looking forward for your further feedback!
Weilong Guo
Thanks very much, Weilong. That makes sense. Just want to share some further thoughts with you. My colleague and I found this issue because doing FilterReaD.py always gives very high bisulfite non-conversion rate (~3%) on unmethylated spike-in control (which frequently has >100x coverage without clonal removal). I tried to use picard for removing PCR duplicates and the relatively high non-conversion rate disappears (to 0.3%). I don't think it will affect much on autosomes and sex chromosomes due to much lower coverage compared to the spike-in.
Yupeng
Thanks Yupeng!
Recently, our new software package "CGmapTools : https://cgmaptools.github.io" is published in Bioinformatics ("CGmapTools improves the precision of hetero- zygous SNV calls and supports allele-specific methylation detection and visualization in bisulfite-sequencing data, Bioinformatics (2017), btx595"). This package is designed as direct downstream analysis package following BS-Seeker2. Hope you will find it be useful. Looking forward to feedback for both BS-Seeker2 and CGmapTools.
Best, Weilong
At 2017-12-08 11:08:02, "Yupeng He" notifications@github.com wrote:
Thanks very much, Weilong. That makes sense. Just want to share some further thoughts with you. My colleague and I found this issue because doing FilterReaD.py always gives very high bisulfite non-conversion rate (~3%) on unmethylated spike-in control (which frequently has >100x coverage without clonal removal). I tried to use picard for removing PCR duplicates and the relatively high non-conversion rate disappears (to 0.3%). I don't think it will affect much on autosomes and sex chromosomes due to much lower coverage compared to the spike-in.
Yupeng
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
I just found that doing read filter using FilterReads.py may introduce bias that is not easily noticed. This may has been reported previously but I feel it may be useful to put a warning on README such that people realize such potential bias.
The bias will lead to overestimation of methylation level at lowly methylated sites (<50%) and underestimation at highly methylated sites (>50%). Suppose there are 100 reads covering one isolated CpG whose actual methylation level is 10%. We would expect 90 reads contain T basecalls and 10 reads contain C basecalls, which give us 90% methylation level. If we do FilterReads.py, then only one read with C basecall and one read with T basecall (two in each case if we consider both strands) will be kept, resulting in 50% methylation level. Such bias is more prominent on regions with high coverage and drags methylation level towards 50%. On lowly covered region, the effect is moderate.