EichlerLab / smrtsv2

Structural variant caller
MIT License
53 stars 6 forks source link

QUAL and GQ in vcf #38

Closed mesnger closed 5 years ago

mesnger commented 5 years ago

Hello. I couldn't find any information explaining the value of the QUAL and GQ in the output of SMRTSV_Genotyper_2.0, especially In the vcf file [EEE_SV-Pop_1.ALL.genotypes.20181204.vcf.gz].

I suspect that QUAL has a cap of 100, but some variants with no call(AC=0,AF=0,AN=0) have positive QUAL value. Also, is there a preset filter for sample level GQ in SMRTSV2 genotyper? or can you suggest a number for filtering samples with low GQ?

Thank you for the great tool you make, and all the help that you give. Cheers.

paudano commented 5 years ago

QUAL field is copied from the original VCF, so it is not created by the genotyper. The SMRT-SV variant caller sets the QUAL score to be a Phred-scaled likelihood based on the number of local-assembly contigs supporting the variant call. See Filtering SVs in README.md for more information.

The GQ field is based on relative density of genotype calls estimated by the machine learning model, and those relative density values are encoded in the GL field. For example, consider the genotype field 0/1:9:0.0441,0.8809,0.0750:14.0:5.5 (format GT:GQ:GL:DPR:DPA). The call is heterozygous, and the relative density for heterozygous is 0.8809. The GQ is np.int(10 * -np.log10(1 - 0.8809)), which matches the GQ field, 9. You will only see 255 if the relative density for the variant call is 1.0.

I'll update the documentation and close this out when I push that update. If you have any further questions, feel free to re-open.