Open anoronh4 opened 3 years ago
here's a related example. we have two vcfs that are both called by SvABA on the same sample but vcf A was run by us and vcf B was called by another organization. I was able to find two calls that are likely referring to the same SV but vcf B was included but vcf A was excluded. calls from vcf A:
1 15142819 9184:1 T [1:15145551[T 33 PASS DISC_MAPQ=60;EVDNC=ASDIS;HOMSEQ=TTG;MAPQ=60;MATEID=9184:2;MATENM=0;NM=0;NUMPARTS=2;SCTG=c_1_15141001_15166001_51C;SPAN=2732;SVTYPE=BND GT:AD:DP:GQ:PL:SR:DR:LR:LO 0/0:0:40:10.8:0,10.8,118.8:0:0:10.93:0 0/1:13:36:33.2:33.2,0,53:6:10:-33.08:33.38
1 15145551 9184:2 C [1:15142819[C 33 PASS DISC_MAPQ=60;EVDNC=ASDIS;HOMSEQ=TTG;MAPQ=60;MATEID=9184:1;MATENM=0;NM=0;NUMPARTS=2;SCTG=c_1_15141001_15166001_51C;SPAN=2732;SVTYPE=BND GT:AD:DP:GQ:PL:SR:DR:LR:LO 0/0:0:40:10.8:0,10.8,118.8:0:0:10.93:0 0/1:13:36:33.2:33.2,0,53:6:10:-33.08:33.38
calls from vcf B:
1 15142891 3476:no_id:1 G [1:15145551[G . PASS BLACKLIST=0;EVDNC=ASDIS;HOMSEQ=CAA;MAPQ=60;MATEID=3476:no_id:2;MATEMAPQ=60;NDISC=0;NSPLIT=0;NUMPARTS=2;SCTG=c_1_15142501_15148501_16;SPAN=2660;SVTYPE=BND;TDISC=8;TSPLIT=4 NALT:NALT_RP:NALT_SR:READ_ID 0:0:0:. 11:8:4:HS7_5723-2-46-16007-41348,HS14_5615-1-63-4915-9134,IL34_5198-4-118-3337-18138,IL34_5198-7-27-17628-16584,IL34_5198-8-6-16510-14867,HS17_5841-7-1201-14741-147696,IL34_5198-2-106-13813-18791,IL34_5198-3-107-18322-17933,HS14_5615-2-3-9163-127602,HS14_5615-2-4-15789-93334,IL34_5198-2-72-5044-15786
1 15145551 3476:no_id:2 C [1:15142891[C . PASS BLACKLIST=0;EVDNC=ASDIS;HOMSEQ=CAA;MAPQ=60;MATEID=3476:no_id:1;MATEMAPQ=60;NDISC=0;NSPLIT=0;NUMPARTS=2;SCTG=c_1_15142501_15148501_16;SPAN=2660;SVTYPE=BND;TDISC=8;TSPLIT=4 NALT:NALT_RP:NALT_SR:READ_ID 0:0:0:. 11:8:4:HS7_5723-2-46-16007-41348,HS14_5615-1-63-4915-9134,IL34_5198-4-118-3337-18138,IL34_5198-7-27-17628-16584,IL34_5198-8-6-16510-14867,HS17_5841-7-1201-14741-147696,IL34_5198-2-106-13813-18791,IL34_5198-3-107-18322-17933,HS14_5615-2-3-9163-127602,HS14_5615-2-4-15789-93334,IL34_5198-2-72-5044-15786
Thinking that my minimum length parameter was excluding some of the calls, changed minimum length to 0. This did not solve the issue -- 100% of entries had SUPP_VEC=10
.
well i think i figured it out. SURVIVOR does not like the GT field, but this is (somewhat) remedied by specifying minimum caller to be 0 instead of 1. Then all the calls accepted from vcf A say SUPP_VEC=00
. not ideal, so instead I'll just have to clean out GT from the vcf before merging.
I won't close this issue myself because I think the function could be improved by adding options for handling GT.
i was unable to merge SvABA calls using SURVIVOR and I don't see any issues regarding SvABA, so this might be a new topic for this tool. My command is:
SURVIVOR merge sample_files 100 1 0 0 0 50 sample_merged.vcf
i am allowing a minimum of one caller but SURVIVOR seems to simply skip svaba ones, probably not recognizing them as SVs. and i have generated a simple comparison matrix (the third vcf is svaba):a few sample lines from the svaba vcf include:
Svaba seems to only return
BND
, not sure if that's an issue. mymerge
command allows a minimum of one caller and ignores conflicting types.