Closed lilwo closed 3 months ago
@al2na ,Hi,there's the image
In addition.The mean coverage calculated by processBismarkAln is much lower compared to the average depth obtained after deduplication using the bamdst software. Could you explain why this might be? Thank you very much.
Hi @lilwo,
This is a short explanation of the filtering parameters used by processBismarkAln:
minqual
: This parameter filters reads based on their base quality
scores. It sets a minimum threshold for the quality score of a base in a
read for that base to be counted towards coverage.mincov
: This parameter filters bases based on coverage depth. It sets a
minimum threshold for the number of reads that must cover a position for it
to be included in the final output.In addition, the output produced by processBismarkAln will only retain reads containing a valid methylation call https://felixkrueger.github.io/Bismark/bismark/alignment/#methylation-call, so reads containing a SNP at the relevant position will not be considered. Also, we filter by context, thus only reads with a methylation call for the given context (by default only cytosines in CpG context) will be considered.
To visualize the methylation coverage and better understand what is retained, you can load the bismark bam files into IGV, or seqmonk (which might be better suited https://github.com/FelixKrueger/Bismark/issues/193 ).
Best, Alex
Am Di., 13. Aug. 2024 um 10:31 Uhr schrieb lilwo @.***>:
@al2na https://github.com/al2na ,Hi,there's the image _20240813161223.png (view on web) https://github.com/user-attachments/assets/88e470c2-2a61-4520-8e72-fa68a86b069c
In addition.The mean coverage calculated by processBismarkAln is much lower compared to the average depth obtained after deduplication using the bamdst software. Could you explain why this might be? Thank you very much.
— Reply to this email directly, view it on GitHub https://github.com/al2na/methylKit/issues/325#issuecomment-2285670762, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADK7JD23ZBCQKUQHJRK4GATZRG75JAVCNFSM6AAAAABMNXV6ZSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBVGY3TANZWGI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hello, I would like to understand how the processBismarkAln function calculates coverage. Specifically, does mincov filter the bases, while minqual filters the reads? Besides these two filtering parameters, are there any other filtering criteria? I noticed a discrepancy in the number of reads where many reads seem to be missing. As shown below.The last column in the first row shows the coverage number without any filtering conditions applied, the last column in the second row shows the coverage with mincov=10 and minqual=20, and the third row shows the number of reads at that position in the input BAM file.