fritzsedlazeck / SURVIVOR

Toolset for SV simulation, comparison and filtering
MIT License
347 stars 46 forks source link

Issue merging SVs within the same caller #143

Open wbooker opened 3 years ago

wbooker commented 3 years ago

I've been trying to get output from SURVIVOR to work with Paragraph, but I noticed something is happening during merging that is causing an error.

So I'm trying to merge output from 4 callers, and I noticed SURVIVOR was merging SVs from the same caller as well (which makes sense). However, the matching of POS info with REF and ALT seems to be off, which is causing the issue.

To be more specific:

These two called SVs:

AaegL5_1 96903879 MantaDEL:467180:1:1:0:0:0 GAATTCCATTATGGAAATTTAACCTGGAACCCCTCGCAGCTGGAACCTATAC G 999 MaxDepth END=96903930;SVTYPE=DEL;SVLEN=-51;CIGAR=1M51D;CIPOS=0,1;HOMLEN=1;HOMSEQ=A GT:FT:GQ:PL:PR:SR 1/1:PASS:77:999,80,0:0,0:0,29

and

AaegL5_1 96904565 MantaDEL:467180:1:1:8:0:0 TTCATATATGATTCTACCCCATTACCCCGAATGCCACTTCCCCGATTGCCAACATCCCGAATTCTATTACCCCGAACGTACCATTACCCCGAGTTCCATCACCCCGAATGCTATTTCCCCGAATTTACTATTATCTTGACATGATGATATTGATGACATTTCCCCACGATATTGGAATTCGGGGTAATGTCATTCGGGGTGATGGGTCGTTCGGGGATTTGGAATTCGGGGTAATGGTGTTTGGATTAATGGCATTCGGGGTAATGGGGTAGAATC T 48 MaxMQ0Frac;SampleFT END=96904840;SVTYPE=DEL;SVLEN=-275;CIGAR=1M275D;CIPOS=0,8;HOMLEN=8;HOMSEQ=TCATATAT GT:FT:GQ:PL:PR:SR 1/1:MinGQ:5:100,6,0:0,0:0,3

are being merged because they are within 1000bp and result in

AaegL5_1 96903879 MantaDEL:467180:1:1:8:0:0 TTCATATATGATTCTACCCCATTACCCCGAATGCCACTTCCCCGATTGCCAACATCCCGAATTCTATTACCCCGAACGTACCATTACCCCGAGTTCCATCACCCCGAATGCTATTTCCCCGAATTTACTATTATCTTGACATGATGATATTGATGACATTTCCCCACGATATTGGAATTCGGGGTAATGTCATTCGGGGTGATGGGTCGTTCGGGGATTTGGAATTCGGGGTAATGGTGTTTGGATTAATGGCATTCGGGGTAATGGGGTAGAATC T 1127 PASS SUPP=2;SUPP_VEC=0011;SVLEN=-275;SVTYPE=DEL;SVMETHOD=SURVIVOR1.0.7;CHR2=AaegL5_1;END=96903930;CIPOS=0,686;CIEND=0,910;STRANDS=+- GT:PSV:LN:DR:ST:QV:TY:ID:RAL:AAL:CO 1/1:NA:275:0,0:+-:999,48:DEL,DEL:MantaDEL_467180_1_1_8_0_0:TTCATATATGATTCTACCCCATTACCCCGAATGCCACTTCCCCGATTGCCAACATCCCGAATTCTATTACCCCGAACGTACCATTACCCCGAGTTCCATCACCCCGAATGCTATTTCCCCGAATTTACTATTATCTTGACATGATGATATTGATGACATTTCCCCACGATATTGGAATTCGGGGTAATGTCATTCGGGGTGATGGGTCGTTCGGGGATTTGGAATTCGGGGTAATGGTGTTTGGATTAATGGCATTCGGGGTAATGGGGTAGAATC:T:AaegL5_1_96903879-AaegL5_1_96903930,AaegL5_1_96904565-AaegL5_1_96904840

which appears to be holding onto the POS from the first call, but the REF and ALT of the second call.

Is this expected behavior?

Thanks, Will

wbooker commented 3 years ago

Hey Fritz,

Just wondering if you have any insight into this. Using some toy examples, if I try and sort out the VCF to get the position to align with the original ref/alt I don't have any downstream issues, but this becomes an issue when there are multiple merges and sorting out which SV call is the correct match when they are at the exact same position. I tried going through the code to see if I could get it sorted myself, but I'm not as well versed in C to do so.

Thanks, Will