Illumina / GTCtoVCF

Script to convert GTC/BPM files to VCF
Apache License 2.0
41 stars 30 forks source link

Discrepancies with the VCF Specification #27

Closed brmyers closed 5 years ago

brmyers commented 5 years ago

Thanks for the software. While using it, I discovered two items that are not following the VCF specification. https://samtools.github.io/hts-specs/VCFv4.2.pdf

  1. Commas are used to delimit the ID field, and the specification states semicolons (page 4)
  2. "GQ" is already a reserved keyword in the FORMAT field (page 6)

I'm doing some post-processing as a workaround, but figured I'd mention my findings.

Best, Ben

KelleyRyanM commented 5 years ago

@brmyers , Good point about the ID delimiter.

The use of the GQ field for genotype information was intentional. Admittedly, the specification indicates this should be a phred-scaled quality score, which is not true of the array quality scores; however, there is some benefit to re-using the "standard" field for quality information.

KelleyRyanM commented 5 years ago

Created pull request at https://github.com/Illumina/GTCtoVCF/pull/29

brmyers commented 5 years ago

Fair point about reusing GQ @KelleyRyanM, but it's worth mentioning that htsjdk (not sure about other VCF libraries) changes the GQ/Format header to an integer and rounds the values in the records to zero or one.

Appreciate the quick reply.