ACEnglish / truvari

Structural variant toolkit for VCFs
MIT License
320 stars 48 forks source link

Merging of VCFs of different specifications #232

Closed vondant closed 1 month ago

vondant commented 1 month ago

I am currently attempting to merge/collapse VCFs produced by DELLY, LUMPY and MANTA using Truvari. Two of the VCFs produced are in the 4.2 specification and one is produced in the 4.1 specification. I wanted to inquire whether Truvari can handle VCF files with different format versions, and if so, how does it handle this?

ACEnglish commented 1 month ago

Yes it can. If you read the VCF format specifications, you'll see they don't change much between versions. But more importantly, by not using a custom VCF parser but instead relying on the pysam library, efficient parsing between specifications is more about pysam having a consistent api than the vcf format specificatio remaining unchanged.

vondant commented 4 weeks ago

Thanks for the prompt reply!

Following on the same when I tried merging outputs from DELLY, LUMPY and MANTA using Truvari it gave me 2 Variants at the same position for a chromosome.

I used bcftools merge -m none to remove multi-allelic entries and then ran truvari collapse with the default parameters and no flags.

chr2    48330378    MantaDUP:TANDEM:24728:0:1:0:0:0 T   <DUP:TANDEM>    .   PASS    END=48330518;SVTYPE=DUP;SVLEN=140;SVINSLEN=3;SVINSSEQ=TTG;SOMATIC;SOMATICSCORE=42   PR:SR   .:. .:. 40,0:49,0   32,1:56,8
chr2    48330378    DUP00000135 T   <DUP>   480 PASS    PRECISE;SVTYPE=DUP;SVMETHOD=EMBL.DELLYv1.2.6;END=48330519;PE=0;MAPQ=0;CT=5to3;CIPOS=-2,2;CIEND=-2,2;SRMAPQ=60;INSLEN=0;HOMLEN=2;SR=8;SRQ=0.984496;CONSENSUS=GACATTTCTTTTTCTATGAAAAACATTGTTTAGTGATCTGAATTCAGTGAATGACTACCTCTTTCTCTTTATTGAATTGCTATGCAGCTTCCATAAAATTCACAAACATTTGGATTTCTGGGCTTTGCCA;CE=1.89393;CONSBP=72;RDRATIO=1.36851;SOMATIC GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 0/1:-13.6735,0,-72.9675:137:PASS:2159:6402:2799:3:0:0:27:8  0/0:0,-7.81718,-79.9904:78:PASS:2431:4897:2759:2:0:0:26:0   ./.:.:.:.:.:.:.:.:.:.:.:.   ./.:.:.:.:.:.:.:.:.:.:.:.

Could you elaborate why this behaviour is occurring and how to mitigate the same? Thanks again!

ACEnglish commented 4 weeks ago

https://github.com/ACEnglish/truvari/wiki/bench#svs-without-sequences