EBIvariation / vcf-validator

Validation suite for Variant Call Format (VCF) files, implemented using C++11
Apache License 2.0
130 stars 39 forks source link

No newline error at the end of the VCF #148

Closed elowy01 closed 6 years ago

elowy01 commented 6 years ago

Hi,

I am running the vcf-validator and it is considering a VCF as not valid and throwing the following error:

Error: There is no newline at the end of the file.

I had a look to the VCF and it seems to be correct, I think this could be caused by the Boost libraries that I think the validator uses. But I could be wrong.

Could you please help with this?

Thanks,

ernesto

jmmut commented 6 years ago

ok, what OS are you using? Linux, OSX or windows?

Depending on how you are checking that the newline is there, it can be confusing, as some editors are misleading. can you show the last bytes of the file? If you are in linux, something like tail your_file.vcf | xxd

If the newline character is there, you should see the 0a byte like this:

last line in a vcf:
DS562895.1  5128614 ss5009159425    T   C   .   .   AC=1

last line in xxd output:
0000060: 2e09 2e09 4143 3d31 0a                   ....AC=1.
elowy01 commented 6 years ago

Hi, I am using Linux

When I do: zcat input.vcf.gz | tail -n 5 | xxd

I get (showing only the last lines): 00d4e0: 0930 7c30 0930 7c30 0930 7c30 0930 7c30 .0|0.0|0.0|0.0|0 000d4f0: 0930 7c30 0930 7c30 0930 7c30 0930 7c30 .0|0.0|0.0|0.0|0 000d500: 0930 7c30 0930 7c30 0930 7c30 0930 7c30 .0|0.0|0.0|0.0|0 000d510: 0930 7c30 0930 7c30 0930 7c30 0a .0|0.0|0.0|0.

So it seems that the new line is there,

Thanks,

e

jmmut commented 6 years ago

well that's strange. We will need to debug more thoroughly. Why do you say it might be because of the boost library?

Also how big is the file? we took care to make the validator work with huge files and lots of samples, but there could still be some bug. Can you tell approximate number of samples and lines? or number of samples and file size if counting the lines would take too long

srbcheema1 commented 6 years ago

@elowy01 could you please tell the command you are using to validate the file. And also please tell which version of vcf-validator are you using. Once try it after uncompressing it and then passing to the validator. zcat input.vcf.gz | ./vcf_validator It would be nice if you could provide the results of using uncompressed file on v0.7 of the validator. That information will be helpful for us.

elowy01 commented 6 years ago

Hi, I've had problems with other programs using Boost that did not read the last line of a compressed file. The VCF is not very big: 283M. Number of samples: 2698

elowy01 commented 6 years ago

On the version used: vcf_validator version 0.8

The command I used is (running the validator directly on the compressed file): vcf_validator_linux -i input.vcf.gz

elowy01 commented 6 years ago

Hi, I've tried using zcat input.vcf.gz |./vcf_validator and it works now,

So I guess that the issue I had is fixed by doing this,

Thanks for your help,

e.