J35P312 / SVDB

structural variant database software
MIT License
39 stars 16 forks source link

Enhance the FORMAT column of merge subcommand. #58

Open xiucz opened 2 years ago

xiucz commented 2 years ago

Hi, I think people are looking for SV merge tools often, and after testing many SV merge tools, such as SURVIVOR, svimmer, SVanalyzer, svtools, truvari, bcftools, and many other tools, I find SVDB is the best tool to merge SV vcf files from different SV callers! It use the set and priority strategy to combine SV events, which is similar to another tool CombineVariants (suitable for small variants).

When I want to merge three vcf from manta, lumpy and svaba, I find the result vcf only contains two field

#svdb merged file
chr13   32913545    MantaDEL:1:279:280:0:0:0:manta|5:lumpy  T   <DEL>   .   MaxDepth    END=32918007;SVTYPE=DEL;SVLEN=-4462;CIPOS=0,2;CIEND=0,2;HOMLEN=2;HOMSEQ=CA;STRANDS=+-:436;CIPOS95=0,0;CIEND95=0,0;SU=436;PE=0;SR=436;VARID=5:lumpy;set=filterInmanta-lumpy;FOUNDBY=2;manta_CHROM=MantaDEL_1_279_280_0_0_0|chr13;lumpy_CHROM=5|chr13;manta_POS=MantaDEL_1_279_280_0_0_0|32913545;lumpy_POS=5|32913547;manta_QUAL=MantaDEL_1_279_280_0_0_0|.;lumpy_QUAL=5|0.00;manta_FILTERS=MantaDEL_1_279_280_0_0_0|MaxDepth;lumpy_FILTERS=5|.;manta_SAMPLE=MantaDEL_1_279_280_0_0_0|QYQ_zuzhi|PR:0:0|SR:0:0;lumpy_SAMPLE=5|QYQ_zuzhi|GT:./.|SU:436|PE:0|SR:436|GQ:.|SQ:.|GL:.|DP:0|RO:0|AO:0|QR:0|QA:0|RS:0|AS:0|ASC:0|RP:0|AP:0|AB:.;manta_INFO=MantaDEL_1_279_280_0_0_0|END:32918007|SVTYPE:DEL|SVLEN:-4462|CIPOS:0:2|CIEND:0:2|HOMLEN:2|HOMSEQ:CA;lumpy_INFO=5|SVTYPE:DEL|SVLEN:-4463|END:32918010|CIPOS:0:0|CIEND:0:0|CIPOS95:0:0|CIEND95:0:0|SU:436|PE:0|SR:436;svdb_origin=manta|lumpy   PR:SR   .,.:.,. 0,0:0,0

#lumpy raw file
chr13   32913547    5   N   <DEL>   0.00    .   SVTYPE=DEL;SVLEN=-4463;END=32918010;STRANDS=+-:436;CIPOS=0,0;CIEND=0,0;CIPOS95=0,0;CIEND95=0,0;SU=436;PE=0;SR=436   GT:SU:PE:SR:GQ:SQ:GL:DP:RO:AO:QR:QA:RS:AS:ASC:RP:AP:AB  ./.:436:0:436:.:.:.:0:0:0:0:0:0:0:0:0:0:.

#manta raw file
chr13   32913545    MantaDEL:1:279:280:0:0:0    T   <DEL>   .   MaxDepth    END=32918007;SVTYPE=DEL;SVLEN=-4462;CIPOS=0,2;CIEND=0,2;HOMLEN=2;HOMSEQ=CA  PR:SR   0,0:0,0

The SVDB merged the FORMAT field, it is trimmed for some reason.

PR: SR  .,.:.,. 0,0:0,0

, I think 1) the key of the FORMAT field should be filled by the tags; 2) the values of the FORMAT field should be the same length as the numbers of the callers(here, 3 callers, 3 columns). If one SV event is called by 3 callers, then the FORMAT should contain all the information from the 3 callers? as the INFO field does. An example,

GT:SU:PE:SR:GQ:SQ:GL:DP:RO:AO:QR:QA:RS:AS:ASC:RP:AP:AB:PR:SR ./.:436:0:436:.:.:.:0:0:0:0:0:0:0:0:0:0:.:.:. .:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:0,0:0,0 .:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.,.:.,.

Here is the command line:

#version SVDB-2.6.4
~/bin/svdb --merge --vcf tumorSV.vcf:manta lumpy.gt.vcf:lumpy svaba.unfiltered.sv.vcf:svaba  --priority svaba,manta,lumpy > svdb.3.vcf

Best, xiucz

J35P312 commented 2 years ago

Hello! Thanks, I'm happy to hear that!

I agree, everything should be transfered into the FORMAT column! It might be a bug in SVDB because the manta calls lacks the GT (SVDB is GT "centric" in some way).

I will have a look!

Best regards Jesper

aksenia commented 8 months ago

Hi @J35P312 ,

We also love your package! Any progress on this issue? I agree that it is very useful to be able to retail the FORMAT field values from all the vcf files that are being merged. Is this something hard to do?