Closed anirudhjay closed 1 year ago
I am not quite sure I understand the issue properly, is there chance you meant 'converted' instead of 'conflated'?
To determine the conversion state of a read, reads are converted in both a CT (top strand) and GA (bottom strand) manner, and if one of the alignments is best - this conversion state is chosen, and recorded in the XG flag. Does that make it clearer?
Hi Felix,
I apologize if I wasn't clear. Yes, I do mean converted. The reason I used the term conflated was because when I count allele frequencies at particular position of a read, say in CT converted reads, I will not be able distinguish if the T was an actual nucleotide variant or just a Cytosine converted to a Thymine due to the Bisulphite treatment. Hence, I collect them as a combined allele C_T or in the other case G_A.
So, I just wanted to confirm that if I have a read with a XG:GA status, I will not be able to distinguish between an A at position X as a nucleotide variant or a Bisulphite treatment induced conversion (given that the ref seq has a Guanine at the same position).
I hope I have provided a bit more clarity
Yes, that's correct. For single read you cannot say whether a T at a C position is a methylation state, or a nucleotide variant. In theory it is possible to identify nucleotide variants by looking at the opposing strand though, as you would find an A
if there was a SNV, but would still find a G if there is no mutation and you were looking at a methylation state. For this approach to work you will need sufficient reads at a sufficient coverage though. Some tools to look at this are methylcoder of BisSNPer.
Hi Felix,
Yes, as of now I am doing exactly what you proposed. ( Looking at CT converted strands for G and A variants and GA converted strands for C and T variants). Thanks for the suggestions !
Best, Anirudh
Hi! I would like to understand how one decides if a sequenced read is CT conflated or GA conflated. This is important for me as I would like to count allele frequencies at particular locations in the genome. Based on looking at SAM files of bismark alignments it seems that those reads with XG:GA are GA conflated while XG:CT are CT conflated. Is this always the case? Can you let me know why this might be?
Thanks! Anirudh