bcgsc / mavis

Merging, Annotation, Validation, and Illustration of Structural variants
http://mavis.bcgsc.ca
GNU General Public License v3.0
72 stars 13 forks source link

Long reads support for Sniffles files #294

Closed zhemingfan closed 2 years ago

zhemingfan commented 2 years ago

Overview

Currently, Sniffles (long read SV caller) outputs INVDUP SVTYPES which MAVIS does not handle. A temporary workaround of treating INVDUP as a combination of inversion, duplication, and insertion has been done.

A series of changes must be made to accommodate Sniffles:

creisle commented 2 years ago

So it looks like the cause of the start > end error is the 0 position in the BND alt syntax. See example test to reproduce below

def test_convert_record():
    variant = VcfRecordType(
        9000,
        12000,
        'chr14_KI270722v1_random',
        alts=['N[chr17_GL000205v2_random:0['],
        ref='N',
        info=VcfInfoType(
            IMPRECISE=True,
            SVMETHOD="Snifflesv1.0.11",
            SVTYPE="BND",
            SUPTYPE="SR",
            SVLEN="0",
            STRANDS="+-",
            RE="5",
            REF_strand="0,0",
            AF="1",
        ),
    )
    records = convert_record(variant)
    records = [_convert_tool_row(r, SUPPORTED_TOOL.VCF, False) for r in records]

Based on the vcf 4.2 spec (https://samtools.github.io/hts-specs/VCFv4.2.pdf) these indicate connections to telomeres. I am not totally sure how to deal with these, but one solution might be to just make the 0 a 1 since we cannot start before the start of a sequence and coordinates are 1-based.

zhemingfan commented 2 years ago

Ensure that future versions use Sniffles2.0