Open iranmdl opened 7 years ago
This means that at position 267356 you have one read with a deletion of ATA (hence the minus sign prefix). Deletions are reported (only) at the location of the first deleted base.
Insertions are a little more complicated.
+AT
for an insertion of an AT after the reported location.Hope that makes sense.
So, if i was looking for a single base pair insertion at a site (say chr17:41209079), would I look for the insertion at chr17:41209079 or chr17:41209078? If my 'region' file had only chr17:41209079 listed, would I be able to recover the insertion or would I need to adjust my 'region' file to include 17:41209078?
Thank you for the clarification @ernfrid . So, if I have understood correctly, in position 267356 in the reference there is an A. And there is one read mapping at that specific position saying that the A has been deleted? Is this correct?
@sheenams - I realize I'm being a bit pedantic, but the devil is in the details here. The answer would depend on what you mean when you say a
single base pair insertion at a site (say chr17:41209079)
Insertions are events that happen between coordinates and you need to choose a convention to report them relative to reference sequence coordinates.
I'll try an example below (let's continue to pretend it's on chr17):
1 2 3 4 5 6
Ref A C G - T G A
| | | | | |
Var A C G A T G A
Here there is an insertion of an A in the variant sequence between bases 3 and 4 of the reference sequence (1-based coordinates). To return counts from bam-readcount for this variant you would need to include the following region in your region file: chr17 3 3
or specify chr17:3-3
on the command line. Larger regions encompassing that base should also return the insertion.
@iranmdl - You understand correctly.
Hello again @ernfrid , I have another question. Does bam-readcount
output "complex" variants?
Example:
chr pos ref alt
10 94005 GC AA
Should i look the position 94005 and 94006, for each change, or can I have in the position 94005 both changes included? Hope it is clear...
Or here we have an even more complicated case, SNP+SNP+Deletion. Let's say that there is a true variation... how would bam-readcount
show this?
1 236275238 AACATTGAAAA GT
Thank you again, :)
bam-readcount
is pretty simplistic in its operation. It's only aware of the read alignments and has no concept of linkage between positions other than gaps in the alignments.
This means that you'll have to look at both positions for your first example. For the second example, it would depend on how the alignments were reported in the BAM file.
Ideally, there would be a haplotype-aware counter that would realign the reads to candidate haplotypes and report counts. I've wondered if this could be done by leveraging vg, but I've not investigated.
Hi there, I'm trying to extract relevant coverage information using bam-readcount tool. And I have realised that sometimes there are this kind of rows:
What does it mean? In 267356 position I have one read with TA insertion? In this case, why is not simply the A counter increased by 1 and that's all? I hope the question is more or less clear. Thank you in advance,