fritzsedlazeck / SURVIVOR

Toolset for SV simulation, comparison and filtering
MIT License
337 stars 46 forks source link

Incorrect SVLEN for translocations in VCF file #207

Open ethering opened 5 months ago

ethering commented 5 months ago

Hi, Using SURVIVOR v1.0.7 I've noticed that the size of translocations given in the VCF file generated by SURVIVOR simSV is not calculated correctly. For example, here is my .params file entry for TRA:

TRANSLOCATION_minimum_length: 9900
TRANSLOCATION_maximum_length: 10000
TRANSLOCATION_number: 10

and here's the BED and VCF file entries generated for two TRA events:

BED:

Chr1    1366162 Chr3    1731342 TRA
Chr1    1376151 Chr3    1741331 TRA
Chr2    1896233 Chr3    1742907 TRA
Chr2    1906139 Chr3    1752813 TRA

VCF:

Chr1    1366162 TRA12SURVIVOR   N   <TRA>   .   LowQual PRECISE;SVTYPE=TRA;SVMETHOD=SURVIVOR_sim;CHR2=Chr3;END=1731342;SVLEN=365180 GT:GL:GQ:FT:RC:DR:DV:RR:RV  1/1
Chr1    1376151 TRA12SURVIVOR   N   <TRA>   .   LowQual PRECISE;SVTYPE=TRA;SVMETHOD=SURVIVOR_sim;CHR2=Chr3;END=1741331;SVLEN=365180 GT:GL:GQ:FT:RC:DR:DV:RR:RV  1/1
Chr2    1896233 TRA13SURVIVOR   N   <TRA>   .   LowQual PRECISE;SVTYPE=TRA;SVMETHOD=SURVIVOR_sim;CHR2=Chr3;END=1742907;SVLEN=-153326    GT:GL:GQ:FT:RC:DR:DV:RR:RV  1/1
Chr2    1906139 TRA13SURVIVOR   N   <TRA>   .   LowQual PRECISE;SVTYPE=TRA;SVMETHOD=SURVIVOR_sim;CHR2=Chr3;END=1752813;SVLEN=-153326    GT:GL:GQ:FT:RC:DR:DV:RR:RV  1/1

The TRA lengths should be 9989 for the first TRA and 9906 for the second TRA. What's happening is that the SVLEN is being calculated by subtracting the start position on 'CHR2' (the receiving chromosome) from the VCF 'POS' coordinate. For example, in the first TRA between Chr1 and Chr3, it's subtracting 1366162 from 1731342 and calculating it as an SVLEN of 365180 and this is also why there's a negative value for the second TRA between Chr2 and Chr3 (1742907 - 1896233 = -153326).

Cheers, Graham