EBIvariation / vcf-validator

Validation suite for Variant Call Format (VCF) files, implemented using C++11
Apache License 2.0
129 stars 39 forks source link

SVLEN is expected to be negative for DELs #234

Closed damianosmel closed 5 months ago

damianosmel commented 5 months ago

Dear developing team,

First, thanks for developing this suite to validate VCF files.

My question comes from running the vcf_validator_linux (v0.9.5) on a VCF with deletions. The validator fails and it states in the log output:

Error: INFO SVLEN must be a negative integer for shorter ALT allelesDEL. This occurs 3 time(s), first time in line 130.

However in the VCFv4.2 specifications, when the SVLEN is discussed it is stated:

When a key reflects a property of a single alt allele (e.g. SVLEN), then when there are multiple alt alleles there will be multiple values for the key corresponding to each allele (e.g. SVLEN=-100,-110 for a deletion with two distinct alt alleles).

However, it's not stated if this is a mandatory feature.

The relevant lines are:

##fileformat=VCFv4.2
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Length of insertion">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  HG00108
NC_000001.11    934192  HGSV_74 T   <DEL>   .   .   AC=3446;AF=0.538101;SVTYPE=DEL;SVLEN=745;CHR2=chr1;ALGORITHMS=manta;SOURCE=gatksv;EVIDENCE=PE,RD    GT  1|1

The shown variant is the first one having such 'problem' with the SVLEN and it is on the line 130 of VCF.

Looking forward to your ideas on how to solve this error for the vcf_validator_linux.

Many thanks! Damianos

tcezard commented 5 months ago

This is correct in VCF 4.2: The relevant text in Section 3 says:

##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles"> One value for each ALT allele. Longer ALT alleles (e.g. insertions) have positive values, shorter ALT alleles (e.g. deletions) have negative values.

Note that the definition of SVLEN is part of the Specification and as such cannot be altered by the tools outputting the VCF which is why the validator is rather strict here.

Note that the definition of SVLEN is changing in VCF 4.4 where <DEL> SVLEN have to be positive.