AlphaGenes / AlphaPeel

AlphaPeel: calling, phasing, and imputing genotype and sequence data in pedigrees
MIT License
2 stars 11 forks source link

Do we need to add base population inbreeding coefficient? #161

Open gregorgorjanc opened 3 months ago

gregorgorjanc commented 3 months ago

Frequency of genotypes in the base population is assumed to follow HWE given the allele frequency (p, q=1-p): [p^2, 2 p q, q^2].

However, with inbreeding in a population we actually expect this kind of distribution: [p^2 + p q F, 2 p q (1 - F), q^2 + p q F] where F is the population (average) inbreeding coefficient.

Does this matter? If we would have highly inbreed individuals, say the distribution is [0.5, 0.0, 0.5], HWE would not represent the genotype frequency well (for the example p=5 and [p^2, 2 p q, q^2]=[0.25,0.50,0.25], but we should have [0.5, 0.0, 0.5]. This seems like a large deviation, though this is extreme.

We could expand the functionality and estimate F, but we often have very limited information about the average/population inbreeding in the base population (where by definition we assume F=0), so likely we will not be able to estimate it. Maybe in cases where we have genomic data or we can make Leggara's metrafounders work make work;)