DecodeGenetics / svimmer

Structural variant merging tool
46 stars 9 forks source link

svimmer and graphtyper for forced genotyping of UNION of Manta and SVIM-ASM discovered SVs #7

Open WimSpee opened 2 years ago

WimSpee commented 2 years ago

Dear @hannespetur

Thank you and colleagues for the very nice svimmer and graphtyper software.

I would like to use svimmer and graphtyper for forced genotyping of the UNION of Manta ( many WGS) and SVIM-ASM (few assembly) discovered SVs in many WGS samples.

SVIM-ASM github https://github.com/eldariont/svim-asm

The versions that I am using are svimmer/20211209 and graphtyper/2.7.3

When I try to get the (merged) UNION of SVs via svimmer I get this error.

Traceback (most recent call last):
  File "/tools/eb/software/Python/3.8.6-GCCcore-10.2.0/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/tools/eb/software/Python/3.8.6-GCCcore-10.2.0/lib/python3.8/multiprocessing/pool.py", line 51, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "/tools/eb/software/svimmer/20211209-GCC-10.2.0/svimmer", line 82, in append_svs_from_vcf
    svs.append(SV(record, check_type=not args.ignore_types, join_mode=args.join_mode, output_ids=args.ids))
  File "/tools/eb/software/svimmer/20211209-GCC-10.2.0/sv.py", line 75, in __init__
    assert False
AssertionError

https://github.com/DecodeGenetics/svimmer/blob/f2d78b2f0e45100f507343a05cf7a65008b2ed9b/sv.py#L75

This is caused by svimmer not recognizing the DUP:TANDEM and DUP:INT types that SVIM-ASM outputs. https://github.com/DecodeGenetics/svimmer/blob/f2d78b2f0e45100f507343a05cf7a65008b2ed9b/sv.py#L41

I can use the svimmer argument --ignore-types to get svimmer to work. But then graphtyper complains about Unknown SV type and I guess also drops the SVs of unknown type??

<warning> constructor.cpp:106 Unknown SV type DUP:TANDEM
<warning> constructor.cpp:106 Unknown SV type DUP:TANDEM

Would it be possible to add a mapping for DUP:TANDEM and DUP:INT in the main branch of the svimmer code here? https://github.com/DecodeGenetics/svimmer/blob/f2d78b2f0e45100f507343a05cf7a65008b2ed9b/sv.py#L41

Then the the combination of SVIM-ASM and svimmer/graphtyper would work for me and others with the same use case/combination of tools.

I also don't understand why SVs of type DUP, CNV and INV are mapped to type INS here https://github.com/DecodeGenetics/svimmer/blob/f2d78b2f0e45100f507343a05cf7a65008b2ed9b/sv.py#L45

That does not make sense to me. INS is a novel sequence , DUP, CNV and INV are sequences already found on the reference genome and therefore also need to genotyped differently in graphtyper?

Also what I find strange is that both svimmer and graphtyper do output SVs of type DUP. That I can't square with the mapping of DUP, CNV and INV to INS. Or maybe the SV type is re-calculated again somewhere else in svimmer/graphtyper?

Thank you for your thoughts and help on this.