broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.71k stars 591 forks source link

Switch to the updated type & location inference tool in SV pipeline #4111

Open SHuang-Broad opened 6 years ago

SHuang-Broad commented 6 years ago

As we have finished implementing the updated logic for how variants are interpreted and location inferred by studying local assembly contig alignment signatures, it is time to clean up the corresponding package in the pipeline and make the switch to the updated implementation, which now outputs not only insertion, deletion, small tandem duplication, and inversions, but also novel adjacencies (BND records whose meanings cannot be fully resolved solely from assembly alignment signatures) as well as complex variants that theoretically could be arbitrarily complex (<CPX>, as long as we have assembled across the full event).

Planed organization

the discovery package could be divided roughly now into

interface

SvDiscoveryDataBundle, SvDiscoverFromLocalAssemblyContigAlignmentsSpark, SvType, AnnotatedVariantProducer

alignment prep (sub package)

AlignmentInterval, AlignedContig (refactor AssemblyContigWithFineTunedAlignments into AlignedContig), AlignedContigGenerator, AlignedAssembly, ContigAlignmentsModifier (refactor AlnModType into it), GappedAlignmentSplitter, StrandSwitch, FilterLongReadAlignmentsSAMSpark (factor out the major methods in the new alignment filter by score into a 1st level class)

type & location inference (sub package)

deprecated

DiscoverVariantsFromContigAlignmentsSAMSpark

It currently provides 3 groups of functionalities:



Planed steps

  1. repackaging & refactoring (no logic change, see #3934 )
  2. bring in some valuable changes made in PR #3668
  3. more test coverage (ticket #3431)
  4. switch make StructuralVariationDiscoveryPipelineSpark call into SvDiscoverFromLocalAssemblyContigAlignmentsSpark by default and optionally into DiscoverVariantsFromContigAlignmentsSAMSpark, i.e. opposite of what we currently do.
SHuang-Broad commented 6 years ago

Updated plan


Small improvements in new interpretation tool


Consolidate logic, bump test coverage and update how variants are represented

consolidate logic

When initially prototyped, there's redundancy in logic for simple variants, now it's time to consolidate.

bump test coverage

Once code above is consolidated, bump test coverage, particularly for the classes above and the following poorly-covered classes

update how variants are represented

Implement the following representation changes that should make type-based evaluation easier


CPX variant re-interpretation

Send cpx variant for re-interpretation of simple basic types, and check for consistency (this might be the difficult part)

SHuang-Broad commented 6 years ago

Updated TODO list:

(optional):