dellytools / delly

DELLY2: Structural variant discovery by integrated paired-end and split-read analysis
BSD 3-Clause "New" or "Revised" License
430 stars 136 forks source link

Delly breakend format for translocations not "pairwise"? #370

Closed cmdcolin closed 7 months ago

cmdcolin commented 7 months ago

Hi there I am a tool developer trying to visualize things like SVs, and trying to see the format of VCF that tools are outputting for things like translocations

I made a synthetic dataset with a "translocation" in it with human chr1 and chr2 fused, generated reads with wgsim, and tried out the latest delly from github

The output has a breakend features in the VCF, but the breakend appears to be only be on "one side of the breakend". my understanding is the spec says that a breakend feature should be on "both sides" of the translocation though

example translocation output from delly grepping the BND

$ grep BND delly.vcf
##ALT=<ID=BND,Description="Translocation">
2       49999999        BND00000926     T       ]1:50999999]T   10000   PASS    PRECISE;SVTYPE=BND;SVMETHOD=EMBL.DELLYv1.2.6;END=50000000;CHR2=1;POS2=50999999;PE=796;MAPQ=35;CT=5to3;CIPOS=-3,3;CIEND=-3,3;SRMAPQ=37;INSLEN=0;HOMLEN=3;SR=26;SRQ=1;CONSENSUS=GATAATTTTTTTGAGACATATTCTCACTCTGTCACCCAGGCTTACAAAAGGAAGAAAAGAGAGATTGCTAGCTCCAGCATGCA;CE=1.96299;CONSBP=41    GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 1/1:-395.284,-70.019,0:10000:PASS:2217:286446:284229:2:10:1330:1:249

but I believe, given the vcf spec, it should be like

2   49999999    BND00000926 T   ]1:50999999]T   10000   PASS    PRECISE;SVTYPE=BND;SVMETHOD=EMBL.DELLYv1.2.6;PE=796;MAPQ=35;CT=5to3;CIPOS=-3,3;CIEND=-3,3;SRMAPQ=37;INSLEN=0;HOMLEN=3;SR=26;SRQ=1;CONSENSUS=GATAATTTTTTTGAGACATATTCTCACTCTGTCACCCAGGCTTACAAAAGGAAGAAAAGAGAGATTGCTAGCTCCAGCATGCA;CE=1.96299;CONSBP=41    GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 1/1:-395.284,-70.019,0:10000:PASS:2217:286446:284229:2:10:1330:1:249
1   50999999    BND00000927 T   T[2:49999999[   10000   PASS    PRECISE;SVTYPE=BND;SVMETHOD=EMBL.DELLYv1.2.6;PE=796;MAPQ=35;CT=5to3;CIPOS=-3,3;CIEND=-3,3;SRMAPQ=37;INSLEN=0;HOMLEN=3;SR=26;SRQ=1;CONSENSUS=GATAATTTTTTTGAGACATATTCTCACTCTGTCACCCAGGCTTACAAAAGGAAGAAAAGAGAGATTGCTAGCTCCAGCATGCA;CE=1.96299;CONSBP=41    GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 1/1:-395.284,-70.019,0:10000:PASS:2217:286446:284229:2:10:1330:1:249

(I believe that the output can also reasonably omit CHR2, POS2, END as that can be derived from the breakend itself, and maybe even CT but that is more minor :)

tobiasrausch commented 7 months ago

It's a bit of a legacy issue because Delly had been using a TRA type for translocations long before the VCF specification had breakends. This became problematic with BND (see https://github.com/dellytools/delly/issues/59) and to remain somewhat backwards compatible I opted for the single-record BND notation (which I still prefer for population studies as outlined in the previous issue).

cmdcolin commented 7 months ago

thanks for the clarification.

just also to add some additional context, I was helping a user who was looking at older delly output files with the TRA elements, and I was curious if I should add support for the CT element for the strand/directionality for the TRA. I think with this method of "single-record BND", it will be still needed to use the CT tag then to 'determine the directionality' of what i call the "feet" on either side of the breakend

tobiasrausch commented 7 months ago

Yes, for delly you need to take into account the CT tag for inversion-type rearrangements and inter-chromosomal translocations. Here is a table that shows the different connection types: paired-end view

cmdcolin commented 7 months ago

thanks, I will try to add that

If you are open to it, i think that making some option (command line flag) or conversion script (dysgu has this one https://github.com/kcleal/dysgu/blob/master/scripts/convert2bnd.py) to output the paired bnd could help too. it can be beneficial as a downstream user to have "standardized VCF". I acknowledge the breakend spec is awkward (an inversion for example can have 4 breakend lines in the vcf spec!) but i think it is valuable as a standard to try to follow.

cmdcolin commented 7 months ago

will go ahead and close for now

tobiasrausch commented 7 months ago

I think a conversion script is a good idea. That should be doable.

tobiasrausch commented 7 months ago

So here is a first version of such a conversion script:

https://github.com/dellytools/delly/tree/main/scripts