Closed nh13 closed 3 years ago
Expected this:
chrA 10 ... GT:PS 0|1:1
chrA 20 ... GT:PS 1|0:1
chrA 30 ... GT:PS 0|1:1
chrA 40 ... GT:PS 1|0:1
Version: 4.1.8.1
Command:
gatk \
--java-options "-XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -Xmx4g" \
--spark-runner LOCAL HaplotypeCaller \
-ERC GVCF \
-I input.bam \
-R ref.fasta \
-O out.g.vcf \
--bam-output out.bam \
--assembly-region-padding 1000 \
--max-assembly-region-size 1000 \
--smith-waterman JAVA \
--linked-de-bruijn-graph \
-L <100bp region>
@cwhelan Since you're looking at the phasing code in HC at the moment, would you have any insight into this issue?
Shared with @cwhelan via email. Thanks for looking at this!
Debugging update: the issue in this case seems to be that HaplotypeCaller constructs an additional haplotype containing a fifth variant a little bit downstream of the four called variants. The fifth variant has low read support and is not called, but matches the reference base at all of the four called positions and so is out of phase with both of the possible phase sets suggested by the four called variants. This causes the current phasing algorithm to give up, as it requires all of the haplotypes containing alternate alleles at a locus to agree with one of the phase sets it creates.
A possible solution I am exploring is to exclude haplotypes that don't have any called non-ref alleles from the set of calledHaplotypes
passed to AssemblyBasedCallerUtils.phaseCalls
by HaplotypeCallerGenotypingEngine
.
This should be fixed by https://github.com/broadinstitute/gatk/pull/7019
Thanks @cwhelan !!!
I have four phased variants in close proximity that have the following pattern:
These four variants are wholly contained in a single set of reads. There are of course other reads that partially span them.
The first variant is a deletion, while the remaining three are SNVs. Examining the reads, there are two haplotypes since:
I would have expected them all to have the same phase set (
PS
) value.I have a test case I can share privately (let me know a good email to send it to confidentially).