Open DrMcStrange opened 3 years ago
Okay, I've now checked with colleagues who have run GATK germline small variant calling runs, and their final VCFs are missing calls for the same region on chr1. These are runs with hg38 as reference - I've also checked some hg19 runs and they're unaffected.
We still have all the intermediate files for some runs. Where would be the best place to begin tracking this down?
In case anyone's interested, I've now done some testing on 1.2.9 and this issue appears to be resolved.
Still, I'm curious to know what caused this, and whether anyone else ran into the same issue with GATK joint calling. From my earlier testing, it affected joint calling (but not population calling) on at least versions 1.2.4 and 1.2.8.
Cancel that last comment - turns out I was looking at batch calling results, not joint calling. My joint calling tests are currently crashing, so the issue may still exist.
I've come across some strange results while filtering variants from a GATK germline small variant calling run. The full run included 57 WGS and 11 WES samples. In one region on chromosome 1, the final VCF only gives genotypes for the WES - all WGS samples show as missing. Here's an example of a variant in the region:
Looking at the gVCF for one of the missing WGS samples, the variant is called as 0/0 with good coverage and quality:
The gVCFs for the WES samples show the variant called as 0/1, but with quite uneven allele depth:
Oddly, a nearby variant is called in the same WES gVCF with similar stats, but doesn't show up in the main VCF:
Looking at
bcbio-nextgen-commands.log
, the region in the main VCF for which the WES samples are missing corresponds to one of the calling regions used by the GATK commands:So far I've only spotted one region where this has happened. There may be more, but most of the genome appears to be fine.
So, any ideas as to what might have happened here? There seem to be two anomalies: all the WGS calls failing to make it from the gVCFs to the main VCF, and inconsistent calling of variants in the WES samples. I'm at a loss to explain either of them.
Attached are the
bcbio-nextgen.log
andbcbio-nextgen-commands.log
. Unfortunately I'd already deletedwork
before spotting this, so I don't havebcbio-nextgen-debug.log
. The run had a couple of false starts due to silly mistakes (wrong BED files for the exomes), but I don't see any way those could have led to these results.bcbio-nextgen.log bcbio-nextgen-commands.log