Open tfenne opened 4 years ago
When I ran this with 4.5.0.0
, I get:
chr1 68896800 . G <NON_REF> . . END=68896831 GT:DP:GQ:MIN_DP:PL 0/0:118:99:107:0,120,1800
chr1 68896832 . CTTTAGTTTT C,<NON_REF> 1597.60 . BaseQRankSum=0.000;DP=122;ExcessHet=0.0000;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=0.000;RAW_MQandDP=439200,122;ReadPosRankSum=1.341 GT:AD:DP:GQ:PGT:PID:PL:PS:SB 0|1:67,43,0:110:99:0|1:68896832_CTTTAGTTTT_C:1605,0,2683,1807,2813,4620:68896832:67,0,43,0
chr1 68896841 . T TCC,<NON_REF> 1596.60 . DP=110;ExcessHet=0.0000;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=0.000;RAW_MQandDP=396000,110;ReadPosRankSum=1.566 GT:AD:DP:GQ:PGT:PID:PL:PS:SB 0|1:67,43,0:110:99:0|1:68896832_CTTTAGTTTT_C:1604,0,2684,1806,2813,4619:68896832:67,0,43,0
chr1 68896842 . G <NON_REF> . . END=68896899 GT:DP:GQ:MIN_DP:PL 0/0:71:99:41:0,99,1485
chr1 68896900 . T <NON_REF> . . END=68896900 GT:DP:GQ:MIN_DP:PL 0/0:41:78:41:0,78,1170
And the 2bp insertion has genotype 0|1
(phased) as we'd like.
Bug Report
Affected tool(s) or class(es)
HaplotypeCaller
Affected version(s)
Description
I have a sample that has a slightly complicated event in it that is getting miscalled. The easiest way to see it is probably with an IGV screenshot:
BWA aligns the reads with a 7bp deletion followed by 2 mismatches, though am inclined to think of it as a 9bp deletion coupled with a 2bp insertion (or a swap of 9bp of reference for 2bp of novel sequence). The original alignments are in the top half of the IGV view. The bottom is the assembly BAM from running the HaplotypeCaller. From what I see the assembly is getting it right.
But the problem is that the event extraction/genotyping goes wrong. I've run it two ways. If I run to generate a called VCF directly using:
Then I get only a single variant reported in the region (the 9bp deletion):
If i run to generate a gvcf then things get more interesting:
yields:
I.e. a record is emitted for the insertion but the genotype is
./.
with a quality of 99, 0s for all the PLs and the other per-sample annotations we'd expect on a variant record missing.I suspect the problem has something to do with the fact that the deletion and insertion are in cis and the insertion's anchor base is within the deletion. I'm not even sure how one would represent this as a pair of variants. I think ideally this would be emitted as a single variant with
REF=CTTTAGTTTT
andALT=CCC
.Steps to reproduce
Use the command lines above with the files attached in deletion-then-insertion-files.zip
Expected behavior
The resulting VCF (and gVCF) should emit one or more variants with called genotypes that indicate a 9bp deletion followed by a 2bp insertion in cis.
Actual behavior
The 9bp deletion is emitted and a) in the gVCF the 2bp insertion is emitted without a genotype or other information that would allow genotyping, b) if a VCF is called directly the 2bp insertion is absent entirely.