fritzsedlazeck / SURVIVOR

Toolset for SV simulation, comparison and filtering
MIT License
354 stars 47 forks source link

SURVIVOR merge vcf files #27

Closed mroosmalen closed 6 years ago

mroosmalen commented 6 years ago

I tried to merge a couple of vcf files with the following command: SURVIVOR merge sample_files 100 1 0 0 0 30 survivor_merge.vcf

In the output vcf file there are those lines for example:


22  18502316    INV00936SUR N   <INV>   .   PASS    SUPP=8;SUPP_VEC=00001111111100;AVGLEN=690.1;SVTYPE=INV;SVMETHOD=SURVIVORv2;CHR2=22;END=18503011;CIPOS=0,6;CIEND=-5,0;STRANDS=++ GT:PSV:LN:DR:ST:TY:CO   ./.:NaN:0:0,0:--:NaN:NaN    ./.:NaN:0:0,0:--:NaN:NaN    0/0:NA:688:0,7:++:INV:22_18502319-22_18503007   0/0:NA:689:0,3:++:INV:22_18502319-22_18503008   0/1:NA:694:0,0:++:INV,INV:22_18502317-22_18503011,22_18502316-22_18503009   0/1:NA:694:0,0:++:INV,INV:22_18502317-22_18503011,22_18502316-22_18503009   0/1:NA:695:0,0:++:INV:22_18502316-22_18503011   0/1:NA:695:0,0:++:INV:22_18502316-22_18503011   0/1:NA:686:0,0:++:INV,INV:22_18502319-22_18503005,22_18502322-22_18503005   0/1:NA:686:0,0:++:INV,INV:22_18502322-22_18503004,22_18502319-22_18503005   0/1:NA:687:0,20:++:INV:22_18502319-22_18503006  0/1:NA:687:0,14:++:INV:22_18502319-22_18503006  ./.:NaN:0:0,0:--:NaN:NaN    ./.:NaN:0:0,0:--:NaN:NaN
22  18502316    INV00938SUR N   <INV>   .   PASS    SUPP=4;SUPP_VEC=00000000100111;AVGLEN=641.25;SVTYPE=INV;SVMETHOD=SURVIVORv2;CHR2=22;END=18503009;CIPOS=0,75;CIEND=0,0;STRANDS=++    GT:PSV:LN:DR:ST:TY:CO   ./.:NaN:0:0,0:--:NaN:NaN    ./.:NaN:0:0,0:--:NaN:NaN    ./.:NaN:0:0,0:--:NaN:NaN    ./.:NaN:0:0,0:--:NaN:NaN    ./.:NaN:0:0,0:--:NaN:NaN    ./.:NaN:0:0,0:--:NaN:NaN    ./.:NaN:0:0,0:--:NaN:NaN    ./.:NaN:0:0,0:--:NaN:NaN    0/1:NA:559:0,0:++:INV:22_18502391-22_18502950   ./.:NaN:0:0,0:--:NaN:NaN    ./.:NaN:0:0,0:--:NaN:NaN    1/1:NA:620:0,2:--:INV:22_18502387-22_18503007   1/1:NA:693:0,0:++:INV:22_18502316-22_18503009   1/1:NA:693:0,0:++:INV:22_18502316-22_18503009```

Can you explain why those SV's are not merged into one? 
fritzsedlazeck commented 6 years ago

Thanks for reporting this. I am currently evaluating if the input needs to be sorted for larger cases. Can you maybe share the variants from these input vcf files around that coordinates? I see that there is a variant reported between these two entries, which probably caused the problem.

Thanks and sorry for the inconvenience. Fritz

mroosmalen commented 6 years ago

Thanks, sorting the vcf files solved the problem

fritzsedlazeck commented 6 years ago

Thanks for reporting back. I will investigate from when on sorting is required and if I can work around this. Thanks Fritz