Open SHuang-Broad opened 6 years ago
Updated plan
INSLEN
annotation when there's INSSEQ
AlignedContig
and AssemblyContigWithFineTunedAlignments
AssemblyContigAlignmentsConfigPicker
When initially prototyped, there's redundancy in logic for simple variants, now it's time to consolidate.
[x] AssemblyContigWithFineTunedAlignments
hasIncompletePicture()
[x] AssemblyContigAlignmentSignatureClassifier
RawTypes
into fewer cases[x] ChimericAlignment
getCoordinateSortedRefSpans()
, and use in BreakpointsInference
isNeitherSimpleTranslocationNorIncompletePicture()
extractSimpleChimera()
Once code above is consolidated, bump test coverage, particularly for the classes above and the following poorly-covered classes
[x] ChimericAlignment
isForwardStrandRepresentation()
splitPairStrongEnoughEvidenceForCA()
parseOneContig()
(needs testing because we need it for simple-re-interpretation for CPX variants) Note that nextAlignmentMayBeInsertion()
is currently broken in the sense that when using this to filter out alignments whose ref span is contained by another, check if the two alignments involved are head/tail.[x] BreakpointsInference
& BreakpointComplications
[x] NovelAdjacencyAndAltHaplotype
toSimpleOrBNDTypes()
[x] SimpleNovelAdjacencyAndChimericAlignmentEvidence
[x] AnnotatedVariantProducer
produceAnnotatedBNDmatesVcFromNovelAdjacency()
[x] BreakEndVariantType
[ ] SvDiscoverFromLocalAssemblyContigAlignmentsSpark
integration test
Implement the following representation changes that should make type-based evaluation easier
INSDUP
toINS
when the duplicated ref region, denoted with annotation DUP_REPEAT_UNIT_REF_SPAN
, is shorter than 50 bp.DEL
with INSSEQ
annotation, to one of these
INS
/DEL
, when deleted/inserted bases are < 50 bp and annotate accordingly; when type is determined asINS
, the POS
will be 1 base before the micro-deleted range and END
will be end of the micro-deleted range, where the REF
allele will be the corresponding reference bases.INS
and DEL
when both are >= 50, share the same POS
, and link by EVENT
DUP
as a separate 1st class type, we need to
downstreamBreakpointRefPos = complication.getDupSeqRepeatUnitRefSpan().getEnd();
Send cpx variant for re-interpretation of simple basic types, and check for consistency (this might be the difficult part)
Updated TODO list:
(optional):
As we have finished implementing the updated logic for how variants are interpreted and location inferred by studying local assembly contig alignment signatures, it is time to clean up the corresponding package in the pipeline and make the switch to the updated implementation, which now outputs not only insertion, deletion, small tandem duplication, and inversions, but also novel adjacencies (BND records whose meanings cannot be fully resolved solely from assembly alignment signatures) as well as complex variants that theoretically could be arbitrarily complex (
<CPX>
, as long as we have assembled across the full event).Planed organization
the
discovery
package could be divided roughly now intointerface
SvDiscoveryDataBundle
,SvDiscoverFromLocalAssemblyContigAlignmentsSpark
,SvType
,AnnotatedVariantProducer
alignment prep (sub package)
AlignmentInterval
,AlignedContig
(refactorAssemblyContigWithFineTunedAlignments
intoAlignedContig
),AlignedContigGenerator
,AlignedAssembly
,ContigAlignmentsModifier
(refactorAlnModType
into it),GappedAlignmentSplitter
,StrandSwitch
,FilterLongReadAlignmentsSAMSpark
(factor out the major methods in the new alignment filter by score into a 1st level class)type & location inference (sub package)
imprecise: refactor out methods from to-be-deprecated
DiscoverVariantsFromContigAlignmentsSAMSpark
alignment classification:
ChimericAlignment
andNovelAdjacencyReferenceLocations
(very tricky to decouple the functionalities because both have over 50 uses),AssemblyContigAlignmentSignatureClassifier
,VariantDetectorFromLocalAssemblyContigAlignments
simple:
SimpleSVType
,SvTypeInference
,InsDelVariantDetector
,BreakpointComplications
(rename toBreakpointComplicationsForSimpleTypes
)complex:
BreakEndVariantType
,SuspectedTransLocDetector
,SimpleStrandSwitchVariantDetector
deprecated
DiscoverVariantsFromContigAlignmentsSAMSpark
It currently provides 3 groups of functionalities:
ChimericAlignment.parseOneContig
andNovelAdjacencyReferenceLocations(ChimericAlignment chimericAlignment, byte[] contigSequence, SAMSequenceDictionary)
; this should be deprecatedSvTypeInference.inferFromNovelAdjacency()
) and annotation (delegated toAnnotatedVariantProducer.produceAnnotatedVcFromInferredTypeAndRefLocations()
); this should be deprecatedPlaned steps
StructuralVariationDiscoveryPipelineSpark
call intoSvDiscoverFromLocalAssemblyContigAlignmentsSpark
by default and optionally intoDiscoverVariantsFromContigAlignmentsSAMSpark
, i.e. opposite of what we currently do.