brentp / smoove

structural variant calling and genotyping with existing tools, but, smoothly.
Apache License 2.0
231 stars 21 forks source link

Question about a particular deletion call #149

Closed davidecarlson closed 3 years ago

davidecarlson commented 3 years ago

Hi Brent,

Thanks for making this excellent tool! I've got a question about evaluating some of the quality of my SV calling results. I ran smoove to call SVs on 4 high coverage plant samples with the following:

smoove call \
-x \
--name oenothera \
--fasta ref/elata_reference_assembly.fasta \
-p 24 \
--outdir smoove \
--duphold \
--genotype bams/*.bam 2>&1 | tee smoove.log

I then filtered based on the duphold annotations with:

bcftools view \
-i '(SVTYPE = "DEL" & FMT/DHFFC[0] < 0.7) | (SVTYPE = "DUP" & FMT/DHBFC[0] > 1.3 )' oenothera-smoove.genotyped.duphold.vcf  > oenothera-smoove.genotyped.duphold.filtered.vcf

I'm now in the process of visualizing the read support for the SV calls using svviz2

So far, most of the SV's that I've taken a look at seem well supported. However, there is one fairly large (~ 128 Kb) deletion I looked at that svviz2 suggests does not have much read support. Here is the line in the vcf:

HiC_scaffold_1 698736 26 N <DEL> 1366.07 . SVTYPE=DEL;SVLEN=-127936;END=826672;STRANDS=+-:19;IMPRECISE;CIPOS=-30,195;CIEND=-88,4;CIPOS95=0,28;CIEND95=-27,0;SU=19;PE=19;SR=0;GCF=0.488193;AN=8;AC=3 GT:GQ:SQ:GL:DP:RO:AO:QR:QA:RS:AS:ASC:RP:AP:AB:DHFC:DHFFC:DHBFC:DHSP:DHGT 1/1:107:1240.78:-126,-13,-2:43:0:43:0:42:0:0:26:0:16:1:0:0:0:17:7,5,7,63,27 0/0:200:0:-0,-23,-78:79:79:0:78:0:38:0:0:40:0:0:0.884615:0.766667:0.958333:0:106,1,0,0,2 0/0:200:0:-0,-29,-95:96:96:0:95:0:68:0:0:27:0:0:0.0217391:0.0217391:0.0322581:0:9,1,66,11,22 0/1:125:125.29:-27,-15,-76:111:94:16:93:15:69:0:11:24:4:0.14:0.596154:0.5:0.775:5:61,1,0,0,47

And here is the visualization:

image

There are almost no reads found in this region, although the VCF suggests there are 19 PE reads that support the variant.

This might not be an easy question to answer, but do you have any idea what could result in the discrepancy between these two results? Is there any information in the INFO or FORMAT fields of the vcf entry that would suggest that this is an unreliable call?

I'm trying to be fairly conservative, so I'd ideally like to do some more downstream filtering to remove additional spurious SVs. Any suggestions or insights you might have would be useful! Thanks, Dave

brentp commented 3 years ago

Hi Dave,

svtyper is reporting 0 reference pairs and 16 alternate pairs for this SV. I'm not accustomed to looking at svviz, but you could also try samplot. I guess there are some pairs that are spanning this region. Does svviz normally draw a connecting line for spanning reads?

davidecarlson commented 3 years ago

Hi Brent, Sorry for the delay in responding. Based on your suggestion, I've switched to using samplot for visualizing the SV calls, and the results seem to be much more in line with what I would expect. Gonna go ahead and close this. Thanks! Dave