Illumina / hap.py

Haplotype VCF comparison tools
Other
418 stars 124 forks source link

QQ field for ROC curves #63

Open rohandavidg opened 6 years ago

rohandavidg commented 6 years ago

Hi,

Thank you for this great tool. i'm curious as to how the QQ field is calculated when --roc DP flag is used

Here is an example output vcf:. in the example below what does 299 and 2845 imply?

chr1 955597 . G T . . BS=955597 GT:BD:BK:QQ:BI:BVT:BLT 1/1:TP:gm:299:tv:SNP:homalt 1/1:TP :gm:299:tv:SNP:homalt chr1 957640 . C T . . BS=957640 GT:BD:BK:QQ:BI:BVT:BLT 0/1:TP:gm:2845:ti:SNP:het 0/1:TP :gm:2845:ti:SNP:het

Thanks, Rohan

pkrusche commented 6 years ago

Hi Rohan, the QQ field should (most of the time) be a direct translation of the DP field in the original (query) VCF file. I think there is some logic in there to work out if it is an INFO or FORMAT field also. The only non-trivial operation that might be performed is that we infer QQ values for truth variants with no direct query match (i.e. when variants match at haplotype but not VCF level and the truth TP variant is on a different VCF line) from the QQ values of surrounding query variants (I think it uses the minimum value across the superlocus, see here: https://github.com/Illumina/hap.py/blob/master/src/c%2B%2B/lib/quantify/BlockQuantify.cpp#L342).

rohandavidg commented 6 years ago

Thank you so much for the prompt response.