The genotyping accuracy of structural variants has a large impact on the subsequent analysis. Currently, I usually use graphtyper (the latest version) and paragraph to realize this progress. However, during using graphtyper I found some problems that confused me.
First, I found many variants do not have a aggregated tag in the results, like following variants highlighted by yellow color,
Second, I have one sample with both HiFi genome assembly and high depth short-read sequencing data. Using these data, I have assessed the genotype concordance between graphtyper and paragraph. I used the assembly-based method to call a accurate SV reference databases. Although heterozygous SVs are often missed by the fact that assembly only represents one haplotype, it does not affect the results. Two programs used the same BAM file generated by short reads as input. The results are as following,
1, I only considered the variants with a aggregated tag, that is a total number of 36927 SVs.
2, the results are in following table. I have replaced the genotype with ./. when it failed to pass any filter by graphtyper_sv and paragraph, repectively. Many variants (~27%, including 3947 deletions and 6288 insertions) with low quality genotype in graphtyper_sv, but can pass the filter of paragraph. As such a large proportion, I created image views of genomic intervals for them by using sv-plaudit. I only focused on the deletions, here I attached 10 examples, I think they should be assigned a 1/1 genotype. What do you think of this difference? Thanks for your attention.
Hi @hannespetur
The genotyping accuracy of structural variants has a large impact on the subsequent analysis. Currently, I usually use
graphtyper (the latest version)
andparagraph
to realize this progress. However, during using graphtyper I found some problems that confused me.First, I found many variants do not have a aggregated tag in the results, like following variants highlighted by yellow color,
Second, I have one sample with both HiFi genome assembly and high depth short-read sequencing data. Using these data, I have assessed the genotype concordance between
graphtyper
andparagraph
. I used the assembly-based method to call a accurate SV reference databases. Although heterozygous SVs are often missed by the fact that assembly only represents one haplotype, it does not affect the results. Two programs used the same BAM file generated by short reads as input. The results are as following,1, I only considered the variants with a aggregated tag, that is a total number of 36927 SVs.
2, the results are in following table. I have replaced the genotype with
./.
when it failed to pass any filter by graphtyper_sv and paragraph, repectively. Many variants (~27%, including 3947 deletions and 6288 insertions) with low quality genotype in graphtyper_sv, but can pass the filter of paragraph. As such a large proportion, I created image views of genomic intervals for them by usingsv-plaudit
. I only focused on the deletions, here I attached 10 examples, I think they should be assigned a1/1
genotype. What do you think of this difference? Thanks for your attention.Best regards, Zheng zhuqing