I've been testing vcf_validator on Mac and Linux and the behavior is the same on both.
I have a terribly formatted VCF. Errors are mostly from unmatched INFO in the data and header. 8-10 field are affected but vcf_validator only finds 2-3 at a time and not all lines are marked. For example, when I run validator for the first time, I get the following:
Error: INFO dbSNPBuildID does not match the meta specification Number=1 (expected 1 value(s)). This occurs 814 time(s), first time in line 737.
Error: Info field value is not a comma-separated list of valid strings (maybe it contains whitespaces?). This occurs 20 time(s), first time in line 3487.
Error: INFO p3_1000G_AN does not match the meta specification Number=1 (expected 1 value(s)). This occurs 8 time(s), first time in line 89454.
Then I fix them with debugulator. Run vcf validator again, and get the following:
Error: Info field value is not a comma-separated list of valid strings (maybe it contains whitespaces?). This occurs 20 time(s), first time in line 3487.
Error: INFO p3_1000G_AN does not match the meta specification Number=1 (expected 1 value(s)). This occurs 1 time(s), first time in line 38801.
Error: INFO p3_1000G_DP does not match the meta specification Number=1 (expected 1 value(s)). This occurs 8 time(s), first time in line 89454.
There are two issues here:
p3_1000G_AN was not completely fixed with the first run
p3_1000G_DP was not detected in the first run
And second run is not enough, it goes 5 times for a small VCF. So it cannot find p3_1000G_DP unless I fix p3_1000G_AN or dbSNPBuildID.
Hi,
I've been testing vcf_validator on Mac and Linux and the behavior is the same on both. I have a terribly formatted VCF. Errors are mostly from unmatched INFO in the data and header. 8-10 field are affected but vcf_validator only finds 2-3 at a time and not all lines are marked. For example, when I run validator for the first time, I get the following:
Then I fix them with debugulator. Run vcf validator again, and get the following:
There are two issues here:
And second run is not enough, it goes 5 times for a small VCF. So it cannot find
p3_1000G_DP
unless I fixp3_1000G_AN
ordbSNPBuildID
.Is this intentional?
Thank you!