Closed fgvieira closed 2 years ago
Hello! Sometimes SVDB will have a peculiar behaviour!
This is caused by the "--no_intra" option, i.e SVDB is not allowed to merge variants within the same vcf, only between different vcf files.
test_manta.vcf.gz:truth
has the highest priority, and contains
chr3 195691426 MantaBND:34332:0:1:0:0:0:0 G G]chr5:1636168] 626 PASS BND_DEPTH=90;CIPOS=0,12;HOMLEN=12;HOMSEQ=ATTCAGTCCTGG;MATEID=MantaBND:34332:0:1:0:0:0:1;MATE_BND_DEPTH=63;SVTYPE=BND;POS=195691426 GT:FT:GQ:PL:PR:SR 0/1:PASS:626:676,0,999:81,1:59,31 chr5 1636156 MantaBND:34332:0:1:0:0:0:1 T T]chr3:195691438] 626 PASS BND_DEPTH=63;CIPOS=0,12;HOMLEN=12;HOMSEQ=CCAGGACTGAAT;MATEID=MantaBND:34332:0:1:0:0:0:0;MATE_BND_DEPTH=90;SVTYPE=BND;POS=1636156 GT:FT:GQ:PL:PR:SR 0/1:PASS:626:676,0,999:81,1:59,31
Hence, those two variants will never be merged.
The merging is done through a greedy approach:
SVDB will start with the first variant (MantaBND:34332:0:1:0:0:0:0) in the file with highest priority (test_manta.vcf.gz:truth), and compare it to variants in test_manta.vcf.gz. Both variants are similar according to the BND distance, and so, MantaBND:34332:0:1:0:0:0:0 will consume both variants in test_manta.vcf.gz:query; leaving nothing for MantaBND:34332:0:1:0:0:0:1.
I see you are doing this for benchmarking: I suggest you either use the SVDB query, then you can use the OCC tag to check for presence of absence in thruthset, or you merge the variants within truthset and query first, to make sure that there are not many copies of the "same" variant.
Good luck, and feel free to ask if you have other questions! //Jesper
Hi and thanks for the quick reply. However, when you say:
SVDB will start with the first variant (MantaBND:34332:0:1:0:0:0:0) in the file with highest priority (test_manta.vcf.gz:truth), and compare it to variants in test_manta.vcf.gz. Both variants are similar according to the BND distance [...]
How are both variants similar if they are on different chromosomes? SVDB will start with the first variant in truth and compare it to the variants in query. The first one matches (since they are the same), but the second variant is in another chromosome! Why does it also get merged?
No worries, thanks for your patience =P.
SVDB stores each variant as coordinates: (chrA,posA,chrB,posB). Internally, SVDB will sort the coordinates such that chrA < chrB, and if chrA == chrB, then posA < posB.
Hence
chr3 195691426 MantaBND:34332:0:1:0:0:0:0 G G]chr5:1636168]
will become (chr3,195691426,chr5,1636168)
and
chr5 1636156 MantaBND:34332:0:1:0:0:0:1 T T]chr3:195691438]
(chr3,195691438,chr5,1636156)
have a nice weekend! //jesper
I am trying to merge some VCFs with
SVDB v2.6.4
, but I am having some trouble with BNDs.If I have a VCF with two BNDs:
and merge it with itself:
I get:
It seems it is merging three BNDs, one of them on another chr(!):
MantaBND:34332:0:1:0:0:0:0:truth
(chr3)MantaBND:34332:0:1:0:0:0:0:query
(chr3)MantaBND:34332:0:1:0:0:0:1:query
(chr5)And leaving
MantaBND:34332:0:1:0:0:0:1:truth
unmerged. Why is it not mergingMantaBND:34332:0:1:0:0:0:0:truth
withMantaBND:34332:0:1:0:0:0:0:query
, andMantaBND:34332:0:1:0:0:0:1:truth
withMantaBND:34332:0:1:0:0:0:1:query
?