Implements two new tools and updates some methods for a revamp of the CombineBatches cross-batch integration module in gatk-sv.
SVStratify - tool for splitting out a VCF by variant class. Users pass in a configuration table (see tool documentation for an example) specifying one or more stratification groups classified by SVTYPE, SVLEN range, and reference context(s). The latter are specified as a set of interval lists using --context-name and --context-intervals arguments. All variants are matched with their respective group which is annotated in the STRAT INFO field. Optionally, the output can be split into multiple VCFs by group, which is a very useful functionality that currently can't be done efficiently with common commands/toolkits.
GroupedSVCluster - a hybrid tool combining functionality from SVStratify with SVCluster to perform intra-stratum clustering. This tool is critical for fine-tuned clustering of specific variants types within certain reference contexts. For example, small variants in simple repeats tend to have lower breakpoint accuracy and are typically "reclustered" during call set refinement with looser clustering criteria.
SVStratificationEngine - new class for performing stratification.
Updates to breakpoint refinement in CanonicalSVCollapser that should improve breakpoint accuracy, particularly in larger call sets. Raw evidence support and variant quality are now considered when choosing a representative breakpoint for a group of clustered SVs.
Added FlagFieldLogic type for customizing how BOTHSIDE_PASS and HIGH_SR_BACKGROUND INFO flags are collapsed during clustering.
RD_CN is now used as a backup if CN is not available when determining carrier status for sample overlap.
Removed no-sort option in favor of spooled sorting.
Bug fix: support for empty EVIDENCE info fields
Bug fix: in one of the JointGermlineCnvDefragmenter tests
Implements two new tools and updates some methods for a revamp of the
CombineBatches
cross-batch integration module in gatk-sv.SVStratify
- tool for splitting out a VCF by variant class. Users pass in a configuration table (see tool documentation for an example) specifying one or more stratification groups classified by SVTYPE, SVLEN range, and reference context(s). The latter are specified as a set of interval lists using--context-name
and--context-intervals
arguments. All variants are matched with their respective group which is annotated in theSTRAT
INFO field. Optionally, the output can be split into multiple VCFs by group, which is a very useful functionality that currently can't be done efficiently with common commands/toolkits.GroupedSVCluster
- a hybrid tool combining functionality fromSVStratify
withSVCluster
to perform intra-stratum clustering. This tool is critical for fine-tuned clustering of specific variants types within certain reference contexts. For example, small variants in simple repeats tend to have lower breakpoint accuracy and are typically "reclustered" during call set refinement with looser clustering criteria.SVStratificationEngine
- new class for performing stratification.CanonicalSVCollapser
that should improve breakpoint accuracy, particularly in larger call sets. Raw evidence support and variant quality are now considered when choosing a representative breakpoint for a group of clustered SVs.FlagFieldLogic
type for customizing howBOTHSIDE_PASS
andHIGH_SR_BACKGROUND
INFO flags are collapsed during clustering.RD_CN
is now used as a backup ifCN
is not available when determining carrier status for sample overlap.