Closed EvanTheB closed 3 years ago
The problem the validator is raising is that VCF requires a context base for indels or symbolic alleles where REF or ALT would result in empty strings:
MT 100 . T
MT 100 . T
MT 100 . <INS>
MT 99 . A AT
MT 99 . AT A
MT 99 . A A<INS>
But reading the spec, my understanding is that this doesn't apply to the *
allele (overlapping deletion), which would mean this is a bug in the validator and your line is correct. This is my quick answer, I'll confirm this and update this ticket.
ok, there's some unresolved ambiguity in the spec https://github.com/samtools/hts-specs/issues/151, but it seems the overlapping deletion indeed doesn't need a context base. I'll leave this issue open until the bug is fixed in the validator.
Until then, please ignore that kind of warnings where there are overlapping deletions involved.
I'm getting similar errors in my VCF files. It seems to happen in indels, specifically. Here is an example of a line that generates such an error:
chr22 2009 48 GT ATTC . PASS . GT:AD:DP 0/1:11,4:15 0/0:2,0:3
Is this line not properly formatted or is this an error in the validator?
Thanks very much!
-Jason
Hi Jason. First of all, let me assure you that this message should say that it's a warning, so if you only get warnings, your VCF is correct.
Also, looking at you line, I confirm that it's correct. Specifically, VCF requires a context base for indels or symbolic alleles where REF or ALT would result in empty strings. In your case the indel would not result in an empty string in any of the REF or ALT alleles, so those alleles don't 'share the first nucleotide' but there's no problem with it.
It seems to me that this detail is an oversight in our side in the validator. It's also a slightly different issue than the original issue in this thread (about overlapping deletions), but we can fix this second issue because there is no pending discussion on the spec side.
Let me know if anything remains unclear, thanks.
Thanks for the quick reply and the explanation!
I've run into a separate issue for which I will open another ticket. Thank you!
Tracked in EVA-2050
For this variant (+header https://gist.github.com/EvanTheB/98ea93b53d3952697df1e8fcb72efb3b):
I cannot see what is wrong with that line, from my minor reading of the VCF spec. bcftools norm doesn't change it, GATK accepts it...
Any clues?