symbolic alleles (for Structural Variants) are those where the ALT column has a angle bracket enclosed string such as "<INS>", defined in 1.4.5 of the spec https://samtools.github.io/hts-specs/VCFv4.3.pdf
Currently we raise an error if the following two lines appear:
1 100 . A <INS> . . .
1 100 . A <INS> . . .
This is not strictly a duplicate, as we don't know if the inserted allele was the same. Therefore we have to put a special case in the code that checks for duplicates in record_cache.hpp: if the alternate allele is enclosed in "<>", don't include it in the cache.
symbolic alleles (for Structural Variants) are those where the ALT column has a angle bracket enclosed string such as
"<INS>"
, defined in 1.4.5 of the spec https://samtools.github.io/hts-specs/VCFv4.3.pdfCurrently we raise an error if the following two lines appear:
This is not strictly a duplicate, as we don't know if the inserted allele was the same. Therefore we have to put a special case in the code that checks for duplicates in record_cache.hpp: if the alternate allele is enclosed in "<>", don't include it in the cache.
EDIT: I misunderstood the plan. I said that we should not raise those cases as errors, and just ignore them. Instead, we should raise them as warnings. The place where the record_cache is used is here https://github.com/EBIvariation/vcf-validator/blob/master/src/vcf/vcf.ragel#L224.