EBIvariation / vcf-validator

Validation suite for Variant Call Format (VCF) files, implemented using C++11
Apache License 2.0
129 stars 39 forks source link

Info CIGAR value is not an alphanumeric string compliant with the SAM specification #56

Closed sambrightman closed 7 years ago

sambrightman commented 7 years ago

INFO CIGAR is marked as Number=A and contains commas. The current regex is ([0-9]+[MIDNSHPX])+. Perhaps this should be [0-9]+[MIDNSHPX](,[0-9]+[MIDNSHPX])+?

cyenyxe commented 7 years ago

Yes, it should accept multiple values as the spec describes it as

CIGAR : cigar string describing how to align an alternate allele to the reference allele

The existing regex still applies to single values, because it needs to match patterns like 100M and 10M3I2D. If I get this correctly and split in two rules, it would look like the following:

cigar_value = ([0-9]+[MIDNSHPX])+ cigar = cigar_value ( , cigar_value )*

sambrightman commented 7 years ago

Quite right, yes. Had stab at it in linked PR.