Open arangrhie opened 3 years ago
Hi Arang,
Good question! I think hap.py is behaving as expected here, but it's a little complicated. It appears your first vcf was probably merged from multiple callers, so I think it is essentially calling the same heterozygous SNV twice. hap.py tries to find a way to combine these calls as separate variant calls, and the only way to do that is to say the "heterozygous" SNVs are on opposite haplotypes, which essentially makes a homozygous SNV, and it is counted as a FP because it doesn't match the correct heterozygous genotype for the SNV. When you remove this extra SNV from the vcf, then everything matches the benchmark vcf even though it is represented as a single line instead of 2 lines.
Does that make sense? Thanks! Justin
It's true there is redundancy in the unfiltered example, but the het variant marked as phased is essentially contained in the second (also het) unphased one, so why would hap.py assume it lies on both alleles? If one were to assume it was the same variant, it would make room for the two base deletion contained in the second variant, and everything would be consistent.
Does this conversion to homozygous snp happen during preprocessing stages, or is it part of the comparison engine? What happens if you use the vcfeval engine? When I ran this example directly with rtg vcfeval (i.e. outside of hap.py), the ga4gh intermediate output file contains:
chr20 55234351 . A T . PASS BS=55234351 GT:QQ:BD:BK . 1|0:50.0:FP:lm
chr20 55234351 . AAT A,TAT . PASS BS=55234351 GT:BD:BK:QQ 1/2:TP:gm 1/2:TP:gm:50.0
Which is exactly what I expect it should do -- the spurious het SNP is classified as FP.
Hello,
It seems like hap.py does not properly handle phased variants that overlap with non-phased variants.
Here is an example:
and here is what hap.py produced for the same
chr20:55234351
:It looks like the 1|0 turned into 1/1 while evaluating, and puts both the variants as FPs.
On the other hand, when a filtered set was provided:
Now the non-phased variant becomes TP:
Is this an expected behavior in hap.py?
I am using
and benchmarked against
Thanks, Arang