Illumina / GTCtoVCF

Script to convert GTC/BPM files to VCF
Apache License 2.0
42 stars 30 forks source link

GQ score clarification #68

Open Derrup opened 3 years ago

Derrup commented 3 years ago

Hi, I have converted some GTC files to VCF and I have noticed that the GQ score is always between 0 and 17. I have done the conversion with both Beeline and IAAP but the result does not change. I was under the impression that this value should be equal to the GenCall score. Could you please clarify? Is there a way to have the actual GenCall score in the GTC files?

jjzieve commented 3 years ago

It is the GenCall score just phred scaled https://en.wikipedia.org/wiki/Phred_quality_score

XubCherif commented 3 years ago

Getting the score between 0 and 17 I would like to ask about the recommended threshold. Could you please advise?

jjzieve commented 3 years ago

This is where the calculation occurs: https://github.com/Illumina/GTCtoVCF/blob/develop/GencallFormat.py#L62 A score of 14 would be about a 0.96. The default Gencall cutoff to label as a "NoCall" is 0.15 (https://www.illumina.com/documents/products/technotes/technote_infinium_genotyping_data_analysis.pdf). So I'd say anything greater than or equal to 1 is a an O.K. call. But it depends on how stringent you want your results to be. The reason we moved away from reporting the gencall score directly is to stick to the VCF spec as close as possible as to mitigate bugs in secondary analysis pipelines that consume VCFs. In the spec we reference (https://samtools.github.io/hts-specs/VCFv4.1.pdf) it lists GQ as a phred-scaled integer instead of a float. If you'd like a separate field for the raw gencall score, it should be a pretty straightforward PR that I'd be happy to review.