fritzsedlazeck / SURVIVOR

Toolset for SV simulation, comparison and filtering
MIT License
354 stars 47 forks source link

different results for stats vs filtered #154

Open mattheatley opened 2 years ago

mattheatley commented 2 years ago

think there might be a bug in how the 'stats' and 'filter' commands interpret min/max size as the numbers don't seem to add up? suspect one is filtering by > and the other by >= or something similar but even then there is a disparity in the INS/DEL counts

using one of your examples (col-0_ngmlr-0.2.3_mapped.bam.sniffles1kb_auto.vcf):

SURVIVOR stats col-0_ngmlr-0.2.3_mapped.bam.sniffles1kb_auto.vcf -1 -1 -1 test.stats

Processing: 456 Parsing done: Tot DEL DUP INS INV TRA 456 83 63 191 68 51

SURVIVOR stats col-0_ngmlr-0.2.3_mapped.bam.sniffles1kb_auto.vcf 50 -1 -1 test.stats

Processing: 355 Parsing done: Tot DEL DUP INS INV TRA 355 67 63 106 68 51

SURVIVOR filter col-0_ngmlr-0.2.3_mapped.bam.sniffles1kb_auto.vcf NA 50 -1 0 -1 filtered.vcf

SVs ignored: 106

grep '#' -v filtered.vcf | wc -l

350

SURVIVOR stats filtered.vcf -1 -1 -1 test.stats

Processing: 350 Parsing done: Tot DEL DUP INS INV TRA 350 66 63 102 68 51

you don't actually get 350 variants via stats until using a min size of 52 but even then the DEL/INS counts are different to the filtered.vcf i.e. filtered vcf 66 DEL / 102 INS ; stats 64 DEL / 104 INS

SURVIVOR stats col-0_ngmlr-0.2.3_mapped.bam.sniffles1kb_auto.vcf 52 -1 -1 test.stats

Processing: 350 Parsing done: Tot DEL DUP INS INV TRA 350 64 63 104 68 51