EBIvariation / vcf-validator

Validation suite for Variant Call Format (VCF) files, implemented using C++11
Apache License 2.0
129 stars 39 forks source link

EVA-1016 Flag symbolic SV alleles that are reported as duplicates as Warnings #122

Closed jmmut closed 6 years ago

jmmut commented 6 years ago

symbolic alleles (for Structural Variants) are those where the ALT column has a angle bracket enclosed string such as "<INS>", defined in 1.4.5 of the spec https://samtools.github.io/hts-specs/VCFv4.3.pdf

Currently we raise an error if the following two lines appear:

1   100 .   A   <INS>   .   .   .
1   100 .   A   <INS>   .   .   .

This is not strictly a duplicate, as we don't know if the inserted allele was the same. Therefore we have to put a special case in the code that checks for duplicates in record_cache.hpp: if the alternate allele is enclosed in "<>", don't include it in the cache.

EDIT: I misunderstood the plan. I said that we should not raise those cases as errors, and just ignore them. Instead, we should raise them as warnings. The place where the record_cache is used is here https://github.com/EBIvariation/vcf-validator/blob/master/src/vcf/vcf.ragel#L224.