A utility for merging and genotyping strelka2 GVCFs.
This source code is provided under the Apache License 2.0. Copyright (c) 2018, Illumina, Inc. All rights reserved.
This tool provides basic genome VCF (GVCF) merging and genotyping functionality to provide a multisample BCF/VCF suitable for cohort analysis. Variants are normalised and decomposed on-the-fly before merging. Samples that do not have a particular variant have their homozygous reference confidence estimated from the GVCF depth blocks using some simple heuristics.
This software is in early development, it is largely functional but may contain bugs.
There are various flavours of GVCF in the wild, this tool only works with the format produced by Illumina pipelines.
The only requirement is a C++11 compatible compiler.
git clone https://github.com/Illumina/gvcfgenotyper.git
cd gvcfgenotyper/
make
bin/gvcfgenotyper
find directory/ -name '*genome.vcf.gz' > gvcfs.txt
time ./gvcfgenotyper -f genome.fa -l gvcfs.txt -Ob -o output.bcf
or with some trivial parallelism:
for i in {1..22} X;
do
echo -r $i -f genome.fa -l gvcfs.txt -Ob -o output.chr${i}.bcf;
done | xargs -l -P 23 ./gvcfgenotyper
If you are looking for a sequencing cohort to try this out, have a look at Polaris.
Homozygous reference confidence (GQ
and DP
) works well for SNPs but is less reliable for indels. Our homozygous reference likelihoods are currently just dummy values eg. PL=0,255,255
and should not be used for any sophisticated analysis such as denovo mutation calling (Strelka has good joint-calling-from BAM functionality for small pedigrees).
Complex variants can occasionally contain primitive alleles called in other samples. We are investigating decomposition approaches for this problem.
We are working on multi-threading to improve performance.
Please open an issue on github to provide feedback or ask questions.
This tool depends on htslib, googletest and spdlog. We also borrowed some variant normalisation code from BCFtools.