fritzsedlazeck / SURVIVOR

Toolset for SV simulation, comparison and filtering
MIT License
337 stars 46 forks source link

SURVIVOR simSV - number of SVs doesn't correspond to parmas file #206

Open ethering opened 6 months ago

ethering commented 6 months ago

Hi, I'm running SURVIVOR v1.0.7 and. I've noticed that the number of SVs events generated by SURVIVOR simSV (in both the .bed and .vcf files) does not correspond to the .parmams file and differs depending on the value of option 3 (0 or 1).

Here's what I see:

$ SURVIVOR simSV test.param

Output:

PARAMETER FILE: DO JUST MODIFY THE VALUES AND KEEP THE SPACES!
DUPLICATION_minimum_length: 100
DUPLICATION_maximum_length: 10000
DUPLICATION_number: 3
INDEL_minimum_length: 20
INDEL_maximum_length: 500
INDEL_number: 1
TRANSLOCATION_minimum_length: 1000
TRANSLOCATION_maximum_length: 3000
TRANSLOCATION_number: 2
INVERSION_minimum_length: 600
INVERSION_maximum_length: 800
INVERSION_number: 4
INV_del_minimum_length: 600
INV_del_maximum_length: 800
INV_del_number: 2
INV_dup_minimum_length: 600
INV_dup_maximum_length: 800
INV_dup_number: 2

Then using SURVIVOR simSV to generate SVs:

Using option 3 = 1, I see the correct number of everything, other than zero DUP (I presume for inversions, INV_del_number + INV_dup_number = INVERSION_number). Also, the DUP value is always zero in the true positives and false negatives section of SURVIVOR eval.

$ SURVIVOR simSV reference.fasta test.param 0.1 1 test1_sv
$ cat test1_sv.bed

Mt  1098    Mt  1819    INV
Mt  17423   Mt  18216   INV
Chr2    800538  Chr2    800924  INS
Mt  51828   Chr3    1034461 TRA
Mt  54161   Chr3    1036794 TRA
Chr1    1406312 Chr1    1407023 INV
Chr1    2541684 Chr1    2542421 INV
Chr1    1740043 Chr3    3282514 TRA
Chr1    1741044 Chr3    3283515 TRA

Using option 3 = 0, I see the following (ordered by SV-type for ease): 5 Duplication events, not 3 5 INDELS (1 INS and 4 DEL), not 1 8 INVERSIONS, not 4

$ SURVIVOR simSV reference.fasta test.param 0.1 0 test0_sv
$ cat test0_sv.bed
Chr3    1671702 Chr3    1679825 DUP
Chr3    3600129 Chr3    3604236 DUP
Chr3    725731  Chr3    727808  DUP
Chr2    281472  Chr2    282151  DUP
Mt  55970   Mt  56657   DUP
Mt  43737   Mt  43991   INS
Chr2    2719697 Chr2    2719765 DEL
Chr2    2720309 Chr2    2720377 DEL
Chr2    1496557 Chr2    1496622 DEL
Chr2    1497150 Chr2    1497215 DEL
Chr2    721379  Chr3    1055120 TRA
Chr2    722729  Chr3    1056470 TRA
Chr2    4982397 Mt  21418   TRA
Chr2    4985041 Mt  24062   TRA
Mt  36164   Mt  36770   INV
Chr1    3880402 Chr1    3881102 INV
Chr3    3485167 Chr3    3485931 INV
Chr2    353814  Chr2    354459  INV
Chr2    2719765 Chr2    2720309 INV
Chr2    1496622 Chr2    1497150 INV
Mt  55970   Mt  56657   INV
Chr2    281472  Chr2    282151  INV

Can you comment on this? I've never really understood why SURVIVOR generates different data depending on what the downstream use of it will be (SVs in reference, or SVs in reads). But what is obvious here is that it appears to be generating a different number of SVs than requested in the params file.

Cheers, Graham