EBIvariation / vcf-validator

Validation suite for Variant Call Format (VCF) files, implemented using C++11
Apache License 2.0
129 stars 39 forks source link

Added single-line duplicate fields check for FORMAT #64

Closed Anishka0107 closed 7 years ago

Anishka0107 commented 7 years ago

According to VCF 4.3 specifications, FORMAT cannot have duplicate fields, so added a new check for duplicate fields. If any duplicated FORMATs are found in a single line (colon separated list), then an error is thrown and file is declared as invalid.

Anishka0107 commented 7 years ago

@cyenyxe Sorry I forgot the test, I have added one in a new commit. Could you please take a look?

cyenyxe commented 7 years ago

We have read through the specification documents again and it seems that non-duplicated checks for FORMAT, INFO and FILTER only apply to version 4.3.

Could you please add that condition to the validation you just implemented? The version can be obtained from the source attribute in the Record object. Test cases and associated files will also need to be reorganized.

Anishka0107 commented 7 years ago

This is exactly what I was wondering about. I'll work on it. Isn't it needed for ID ma'am?

cyenyxe commented 7 years ago

You are right, it is also needed for the ID. The paragraph describing that field is more ambiguous than the others...

Anishka0107 commented 7 years ago

@cyenyxe @jmmut I have a doubt. The https://github.com/EBIvariation/vcf-validator/blob/develop/test/vcf/record_test.cpp file only has a test for version 4.1 So for implementing this new test in 4.3, I guess I will have to add another source. Am I guessing it right?

Anishka0107 commented 7 years ago

Added another commit with the checks being removed from v4.1 and v4.2 Now it would only be tested for v4.3