fritzsedlazeck / Sniffles

Structural variation caller using third generation sequencing
Other
546 stars 91 forks source link

Overlapping deletions of the same size wrongly clustered #350

Open AyushSaxena opened 2 years ago

AyushSaxena commented 2 years ago

We have two overlapping deletions of size 22bp that we can identify with clear breakpoints on IGV. One is ~50% of the population, and the other is ~10%. However sniffles2 only detects the major variant. I am using sniffles2 after reducing the minimum support (--minsupport) and minimum sv length (--minsvlen) in non germline mode. I'm using high quality pacbio CCS reads.

I believe that the proximity of the two deletions is making sniffles2 cluster them together. I have seen a similar behaviour with pbsv as well.

Is there a way to prevent sniffles2 from clustering closeby mutations? If I understand correctly, the point of clustering proximate variants is to account for alignment errors, but with CCS reads, the alignment is clean (using minimap2 'map-hifi' mode)

I've tried changing cluster bin size, value of cluster-r and other paramters, but I'm unable to rescue the minor 22-bp deletion. All suggestions are welcome!

Ayush

fritzsedlazeck commented 2 years ago

Hi Ayush, just to be sure that is on the population level right ? Not on the single sample calling ? Thanks Fritz

AyushSaxena commented 2 years ago

Hi Fritz,

I apologize for not being specific enough. This is single sample calling in plasmid DNA that has two potential hairpin regions close to each other, producing overlapping deletion calls when viewed on IGV. (We don't really know what the structure of the hairpin is in double-stranded plasmid). We are sequencing a heterogenous population of plasmids, all sequenced as one sample, and when viewed on IGV, ~50% of the plasmid DNA has one deletion but not the other, and ~10% are the other way around (carries deletion 2 but not deletion 1). I don't think we ever see them both deleted at the same time. Sniffles detects the dominant mutation but misses the rarer one.

I've been trying several parameters on sniffles to ensure that they don't get clustered together somehow. The signal on the CIGAR string is the same for both '22D'. I'm struggling the same way with pbsv as well where I haven't been able to find the right setting where these mutations don't get clustered. The sv signature file in pbsv detects both the 'haplotypes', but the final vcf filters the rarer allele.

Ayush