fritzsedlazeck / Sniffles

Structural variation caller using third generation sequencing
Other
561 stars 95 forks source link

Why the longest insertion is very short #500

Closed hui-liu closed 4 months ago

hui-liu commented 4 months ago

I found the DEL size is very long, here are the top ten:

1510144027
1475342950
1425352425
1417940033
1388887134
1355193465
1312469728
1297948803
1281732081
1263129094

The INV size is also quite long, here are the top ten:

1643401054
1637743238
1610945520
1527408443
1451292220
1296638166
1290627465
1286498828
1286348338
1285434441

while the INS size is quite short, and here are the top ten:

28955
25318
25223
24844
23581
23006
21909
21614
21442
20604

This is the command I used to call the SVs:

sniffles \
--reference ${ref} \
--input ${bamfile} \
--tandem-repeats ${TR} \
--threads ${th} \
--minsupport 4 \
--minsvlen 50 \
--mapq 20 \
--vcf ${sample}.vcf.gz \
--snf ${sample}.snf \
--sample-id ${sample}

which parameters cause this difference? Do these parameters matter?

--long-ins-length 2500
--long-del-length 50000
--long-inv-length 10000
fritzsedlazeck commented 4 months ago

the insertion sizes are depending on the read length. If an insertion is longer than a read, Sniffles try to combine reads but that not always works out well. In contrast DEL , INV , DUP etc. are not dependent on the read size itself. See our review: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1828-7

hui-liu commented 4 months ago

Thank you!