fritzsedlazeck / Sniffles

Structural variation caller using third generation sequencing
Other
558 stars 92 forks source link

Sniffles2 missing high confidence deletions with sufficient read support and depth of coverage #391

Open eesiribloom opened 1 year ago

eesiribloom commented 1 year ago

I have a set of high-confidence consensus SVs called from illumina WGS data which I want to compare to SVs called with nanopore long-reads. There seems to be a surprisingly large number of variants which are not called using sniffles that are present in the consensus SV callset, which is the intersect of GRIDSS and Manta calls. Inspecting deletions in IGV shows these are spanned by my nanopore long reads but are not called by sniffles i.e. I can see the deletions in the read alignments but they are not reported in the VCF file. These are not in regions of low coverage or tandem repeats as far as I can see...

using sniffles2_2.0.7 the command used to call variants was as such...

sniffles --threads 12 --sample-id ON364T_R1.sorted --reference Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa --output-rnames --input ON364T_R1.sorted.bam --tandem-repeats human_GRCh38_no_alt_analysis_set.trf.bed --symbolic --non-germline --vcf ON364T_R1_sorted.sniffles.vcf

ON364_chr12DEL

In this example, coverage across this deletion is ~37x and the deletion contains 4 supporting reads, which i would expect to be sufficient to be called by sniffles. I can confirm the variant is not present in the output vcf file of sniffles or the one shown above which has been filtered for FILTER=<ID=PASS with grep and the start coordinates of the variant /deletion.

grep 99196814 ON364_R1.pass.vcf

This is just one example but there are several instances such as this across multiple samples. What could be the possible reasons these variants are missed and is there a way to improve sensitivity of my calls?

ggrimes commented 1 year ago

From the docs