Illumina / hap.py

Haplotype VCF comparison tools
Other
418 stars 124 forks source link

Inconsistent output when reversing truth and query #71

Open bjtrost opened 5 years ago

bjtrost commented 5 years ago

I have two VCF files, A and B. A contains this variant, whereas B does not:

1 13133936 . G A 286.80 VQSRTrancheSNP99.90to100.00 AC=2;AF=1.00;AN=2;DP=7;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=25.00;QD=26.00;SOR=1.609;VQSLOD=-3.650e+01;culprit=MQ GT:AD:DP:GQ:PGT:PID:PL 1/1:0,7:7:21:1|1:13133936_G_A:315,21,0

When I run hap.py with B as truth and A as query, I get the following line in the annotated VCF file:

1 13133936 . G A 286.8 VQSRTrancheSNP99.90to100.00 BS=13133936;Regions=GC_55-60,RLCRs GT:BD:BK:BI:BVT:BLT:QQ ./.:.:.:.:NOCALL:nocall:. 1/1:FP:.:ti:SNP:homalt:286.8

So far, so good. However, when I run hap.py with A as truth and B as query, there is no entry at all for this variant in the annotated VCF file. Why is this? Am I missing something?

pkrusche commented 5 years ago

I think the problem will be that the variant is filtered in the VCF file. Filtered variants on the truth side of the comparison will be dropped (there is a command line switch for letting them through also: --usefiltered-truth).

bjtrost commented 5 years ago

Ah, that is probably it! Thanks so much. Just to clarify, if I want truth and query to be treated exactly the same, would I use both --preprocess-truth and --usefiltered-truth? (For my dataset, I am comparing two VCFs to one another, but it is not the case that one of them is considered the "truth".)

Thanks again!