Not all variants with aggregated tag and many variants with low quality genotype

Hi @hannespetur

The genotyping accuracy of structural variants has a large impact on the subsequent analysis. Currently, I usually use graphtyper (the latest version) and paragraph to realize this progress. However, during using graphtyper I found some problems that confused me.

First, I found many variants do not have a aggregated tag in the results, like following variants highlighted by yellow color,

Second, I have one sample with both HiFi genome assembly and high depth short-read sequencing data. Using these data, I have assessed the genotype concordance between graphtyper and paragraph. I used the assembly-based method to call a accurate SV reference databases. Although heterozygous SVs are often missed by the fact that assembly only represents one haplotype, it does not affect the results. Two programs used the same BAM file generated by short reads as input. The results are as following,

1, I only considered the variants with a aggregated tag, that is a total number of 36927 SVs.

2, the results are in following table. I have replaced the genotype with ./. when it failed to pass any filter by graphtyper_sv and paragraph, repectively. Many variants (~27%, including 3947 deletions and 6288 insertions) with low quality genotype in graphtyper_sv, but can pass the filter of paragraph. As such a large proportion, I created image views of genomic intervals for them by using sv-plaudit. I only focused on the deletions, here I attached 10 examples, I think they should be assigned a 1/1 genotype. What do you think of this difference? Thanks for your attention.

paragraph	graphtyper_sv	number
0/0	1/1	21
.	1/1	80
0/1	1/1	287
1/1	1/1	16347
./.	0/1	1128
1/1	0/1	309
0/1	0/1	4113
./.	./.	1625
.	0/1	27
.	./.	43
0/1	./.	1613
0/0	0/1	123
./.	1/1	923
1/1	./.	10235
0/0	./.	53

DEL_1_17585578_17585849 DEL_1_4530856_4531144 DEL_1_9165274_9165543 DEL_1_11419338_11420759 DEL_1_11788572_11788868 DEL_1_11868806_11869074 DEL_1_12385257_12385558 DEL_1_13564520_13564832 DEL_1_13583753_13584032 DEL_1_14053261_14053551

Best regards, Zheng zhuqing

DecodeGenetics / graphtyper

Not all variants with aggregated tag and many variants with low quality genotype #93