Open suncpark opened 1 year ago
I cannot know if this is the reason of your problem, but I want to share I just realized that freebayes (version 1.3.7) interprets the absence of the base quality string in BAM files as a literal lack of quality. When I made the mistake of feeding freebayes BAM files without base quality strings, then all the sum of qualities for reference (QR) and alternate (QA) observations were set to zero. And apparently that was the reason why all genotypes and their likelihoods were wrong in my VCF. It was quite scary.
I have observed inconsistency in genotype calling within a VCF file. Although most genotyping make sense with AO/RO number, but I found many instances where an individual with similar numbers of RO and AO was genotyped as hm ref (0/0) rather than ht (0/1). In some extreme cases, where RO was much greater than AO (e.g. 800 RO, 300 AO), the genotype was called as hm alt (1/1).
I am unsure whether this inconsistency is due to computational error or if it is being called based on Bayesian probability. I would be interested to know if others have observed similar results and if there is a standard approach for handling these discrepancies in genotype calling.