DecodeGenetics / graphtyper

Population-scale genotyping using pangenome graphs
http://dx.doi.org/10.1038/ng.3964
MIT License
167 stars 20 forks source link

Graphtyper outputs new alleles that are not in input VCF #91

Open ivargr opened 2 years ago

ivargr commented 2 years ago

Hi!

I am using graphtyper genotype with the --vcf flag to genotype a specific set of input variants. However, it seems that Graphtyper does not only genotype these variants, but in some cases comes up with other alleles for some of the variants.

As an example, I am genotyping the following variant:

5   12446876    .   A   AGAAAG

As output, Graphtyper gives me this:

5   12446876    5:12446876:IG   A   AGAAAG,AGAAAT   255 PASS    [...]   2/2:0,0,15:1:16:45:255,255,255,45,48,0

I.e., Graphtyper suggests another allele AGAAAT and genotypes the variant as 0/2. The allele AGAAAT is not in the input vcf. I suspect this could happen if there is another insertion in the vcf, but in my case there is none.

Is this expected behaviour? If so, does this mean that Graphtyper does not only genotype the provided variants, but also does some kind of variant detection at the sites of the provided variants?

If needed, I can try to reproduce this behaviour on a smaller test data set that can be shared.