exomiser / Exomiser

A Tool to Annotate and Prioritize Exome Variants
https://exomiser.readthedocs.io
GNU Affero General Public License v3.0
190 stars 54 forks source link

Whitespace in INFO field #548

Open AlistairNWard opened 3 months ago

AlistairNWard commented 3 months ago

When running exomiser, I am receiving the error that the vcf file is malformed because the INFO field has whitespace in it. The vcf spec (4.4) explicitly states that whitespace IS allowed in the INFO field.

htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 5680: The VCF specification does not allow for whitespace in the INFO field. Offending field value was "XXX"
julesjacobsen commented 3 months ago

Thanks for the report - Exomiser is using the HTSJDK for VCF file parsing and this only has experimental support for VCF v4.4 at the moment. Exomiser can currently read VCF v4.2 files, so you'll need to replace the whitespace with underscores or hyphens for it to be able to read the file.

I'll open a ticket to update the HTSJDK this once support has been added.

AlistairNWard commented 3 months ago

That makes sense. Thank you. Is it possible to stream a vcf into exomiser? I have thousands of vcf files that I need to process and I don’t want to duplicate them all with the white space removed. I tried using “-vcf -“ and “—vcf stdin” on the command line, but that failed. I’m not too familiar with Java, so was wondering if this was possible?

thanks again

julesjacobsen commented 3 months ago

Sorry, it's not possible to stream the VCF file into Exomiser and unfortunately even the latest HTSJK doesn't support whitespace in VCF files.

AlistairNWard commented 3 months ago

Thanks for the response. That's what I expected, so we'll work around these requirements.