Open JanMiao opened 1 year ago
Hi,
what exactly do you mean by "unsuccessful in genotyping"? Where they reported with a "./." genotype?
Without knowing any details on your experiments, it's hard for me to say what the reason is. Could be related to the lack of unique kmers (since these regions often contain a lot of repetitive sequence). But it could also be related to the input panel. What data are you using to run PanGenie? Which command did you use?
Hi,
My input VCF file was generated using haploid assemblies, and variant calling on WGS data was performed using pangenie. However, in the VCF file obtained from pangenie, some structural variants were missing entirely. It is not simply a case of missing genotypes "./."; these records are completely absent from the VCF file.
I initially expected that pangenie would genotype all the variants present in the input VCF file, even if some variants had a genotype of "./.". However, I have discovered that some variants are missing entirely. If the reason for the missing variants is due to "lack of unique k-mers", would individuals with high-depth resequencing be more successful in genotyping?
Which version of PanGenie are you using? Which command line? And what does the log file report?
PanGenie genotypes all variant present in the input file, but sometimes no genotype can be computed (e.g. if computed genotype likelihoods are the same for several possible genotypes), in this case, genotype "./." is reported.
PanGenie (latest version) only completely skips variants if they contain Ns in the REF/ALT field, or if they are closer than 2*kmer_size bp to the start or end of a chromosome. These cases would be missing completely from the output VCF (but will be reported in the log).
If no unique kmers are found, genotypes are imputed from the panel, but in repetitive sequence contexts (like the telomeres), it might not be possible to compute a genotype, because genotypes have the same likelihoods, in this case "./." is reported. Higher depth would likely not help much here, because the problem is in the complexity of the genome sequence.
Hi, I have observed that some variants located at the ends of chromosomes seem to be unsuccessful in genotyping from all samples in my dataset. If this phenomenon is normal? Could you please provide some explanation for it?
Thanks !