Genotype_sv Aggregate model has less Output SVs than Input SVs

Good afternoon,

I'm writing this to report an issue I've been having while trying to do a test run for graphtyper's genotype_sv command.

I ran 2 sv-callers: Manta and Smoove on 50 samples and then merged their results with Jasmine_sv (similarly to svimmer, maintains the original caller's output information for each variant).

After this, I ended up with a VCF file containing approximately 130,000 structural variants.

I then ran the following command on graphtyper:

graphtyper genotype_sv Homo_sapiens_assembly38_HLA2.fasta \ /path/to/jasmine_merged.vcf \ --output=/path/to/50_samples_test\ --region_file= /path/to/file/containing/contigs_of_interest.txt\ --sams=/path/to/reheadered_bams.txt \ --verbose

After this I took the resulting 6468 VCF files and merged them together using bcftools concat to create a final merged VCF output. Only, the output had only 161,000 structural variants, which is odd since there were multiple records for most of the variants (due to the SVMODEL info field). When filtering for only those with the AGGREGATED model, I ended up with only 66,000 structural variants.

My question is: Does graphtyper carry out some sort of filtering or merging of variants if it considers them to be the same variant when they might not be? Why did my number of structural variants decrease by almost 50% of their original amount? Could it be that there were variants in the original VCF file that overlapped and Graphtyper simply removed these?

DecodeGenetics / graphtyper

Genotype_sv Aggregate model has less Output SVs than Input SVs #152