al2na / methylKit

R package for DNA methylation analysis
https://bioconductor.org/packages/release/bioc/html/methylKit.html
214 stars 96 forks source link

【question】about the coverage in processBismarkAln #325

Closed lilwo closed 3 months ago

lilwo commented 3 months ago

Hello, I would like to understand how the processBismarkAln function calculates coverage. Specifically, does mincov filter the bases, while minqual filters the reads? Besides these two filtering parameters, are there any other filtering criteria? I noticed a discrepancy in the number of reads where many reads seem to be missing. As shown below.The last column in the first row shows the coverage number without any filtering conditions applied, the last column in the second row shows the coverage with mincov=10 and minqual=20, and the third row shows the number of reads at that position in the input BAM file.

lilwo commented 3 months ago

@al2na ,Hi,there's the image 微信图片_20240813161223

In addition.The mean coverage calculated by processBismarkAln is much lower compared to the average depth obtained after deduplication using the bamdst software. Could you explain why this might be? Thank you very much.

alexg9010 commented 3 months ago

Hi @lilwo,

This is a short explanation of the filtering parameters used by processBismarkAln:

In addition, the output produced by processBismarkAln will only retain reads containing a valid methylation call https://felixkrueger.github.io/Bismark/bismark/alignment/#methylation-call, so reads containing a SNP at the relevant position will not be considered. Also, we filter by context, thus only reads with a methylation call for the given context (by default only cytosines in CpG context) will be considered.

To visualize the methylation coverage and better understand what is retained, you can load the bismark bam files into IGV, or seqmonk (which might be better suited https://github.com/FelixKrueger/Bismark/issues/193 ).

Best, Alex

Am Di., 13. Aug. 2024 um 10:31 Uhr schrieb lilwo @.***>:

@al2na https://github.com/al2na ,Hi,there's the image _20240813161223.png (view on web) https://github.com/user-attachments/assets/88e470c2-2a61-4520-8e72-fa68a86b069c

In addition.The mean coverage calculated by processBismarkAln is much lower compared to the average depth obtained after deduplication using the bamdst software. Could you explain why this might be? Thank you very much.

— Reply to this email directly, view it on GitHub https://github.com/al2na/methylKit/issues/325#issuecomment-2285670762, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADK7JD23ZBCQKUQHJRK4GATZRG75JAVCNFSM6AAAAABMNXV6ZSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBVGY3TANZWGI . You are receiving this because you are subscribed to this thread.Message ID: @.***>