Implement alt_allele_prob option

So we can change allele freq in the founders (not setting to 0.5 and not estimating from the data)

@XingerTang - assigned to this issue.

@RosCraddock @gregorgorjanc

Coding tasks to do:

[ ] Modification in tinyhouse.pedigree to store the information of the metafounders while reading in the pedigree file
- [ ] Add MetaFounder flag/attribute to Individual class, while reading in the pedigree data, set the flag to True for each individual with their names start with MF_ (can also check if metafounders are actually founders, raise errors if not)
- [ ] Add automatically generated MF_1 individual to the individual list of the Pedigree object after the input pedigree data is read
- [ ] Modify Pedigree.readInPedigree so that for each individual in the pedigree
```
if both parents == None None and not MetaFounder:
set MF_1 as parents
```
[ ] Modification in alphapeel.peelinginfo to store the corresponding alternative allele frequency for each of the metafounders
- [ ] Add jit_peelingInformation.nMF to store the number of metafounders in the population
- [ ] Add jit_peelingInformation.MFList to store the list of the metafounder individuals (or their ids)
- [ ] Initialize jit_peelingInformation.maf as an nMF $\times$ nLoci numpy matrix
[ ] Modification in alphapeel.peelinginfo to allow user-defined alternative allele frequency to be used in the calculation
- [ ] Calculate the values based on the provided alternative allele frequency while initializing the anterior probabilities of the metafounders
[ ] Modification in alphapeel.tinypeel to add the option alt_allele_prob
- [ ] Add option alt_allele_prob
- [ ] Add preference for the case when both alt_allele_prob and est_alt_prob are used (GG: in that case we use alt_allele_prob as a starting value for est_alt_prob)

I spoke with @AprilYUZhang today and she pointed out some bits that I wan't to clarify here.

Current state in AlphaPeel is the following:

Estimate alternative allele probability based on observed genotyped data (using Newton method on genotype probabilities from observed genotypes (something like this), so accounting for genotyping error)
Take 1. and set it for the rest of the program execution
Use 1. to set anterior term for founders (since alternative allele probability is fixed, so are these anterior terms)
Peel down and up a couple of times to propagate observed genotype information across pedigree

There are two issues with the above: a) in 1. we are estimating alternative allele probability for an "undefined" population (we take any observed genotypes in pedigree), while we really need base population alternative allele probability - while the estimate based on the "undefined" population is not the base population estimate, it probably isn’t miles off, but see also c) b) as we discussed in person, a) will not do what we need for metafounders, but see also c) c) once we get the estimate, keeping it fixed might not be what we want - even if we have slightly off estimate from 1. if we use it as a starting value and then update the base population alternative allele probability by estimating it from inferred individual genotype probabilities for just the founders then we could converge to a better solution - this might make the running time of AlphaPeel longer / we might need more peeling runs - at the moment we effectively use a simple estimate and fix it, so given that estimate we then estimate individual genotype probs - this starting value and convergence thing could actually well work for more than one metafounder too, so there is hope for b) too

The above suggests that we would like to end up in this "correct" state:

Estimate alternative allele probability based on observed genotyped data (using Newton method on genotype probabilities from observed genotypes (something like this), so accounting for genotyping error) --> test how the linear model method with genetic groups could serve us better, but note that even a starting value and updates in the founders could work well, so I suggest we do this linear model method last
Take 1. and set it for the rest of the program execution --> I would like us to explore updating base population allele probability with every round of peeling (we start going down and then up, so when we come up, we have genotype probs for founders and we can estimate allele prob there, even separated by multiple metafounders)
Use 1. to set anterior term for founders (since alternative allele probability is fixed, so are these anterior terms) --> implementing change in 2. means we would update anetrior term for founders every iteration too
Peel down and up a couple of times to propagate observed genotype information across pedigree --> hopefully the above changes would not make the algorithm/runtime much slower (as in, that we would need more iterations)

I spoke with @AprilYUZhang today and she pointed out some bits that I wan't to clarify here.

Current state in AlphaPeel is the following:

Estimate alternative allele probability based on observed genotyped data (using Newton method on genotype probabilities from observed genotypes (something like this), so accounting for genotyping error)

Take 1. and set it for the rest of the program execution

Use 1. to set anterior term for founders (since alternative allele probability is fixed, so are these anterior terms)

Peel down and up a couple of times to propagate observed genotype information across pedigree

There are two issues with the above: a) in 1. we are estimating alternative allele probability for an "undefined" population (we take any observed genotypes in pedigree), while we really need base population alternative allele probability - while the estimate based on the "undefined" population is not the base population estimate, it probably isn’t miles off, but see also c) b) as we discussed in person, a) will not do what we need for metafounders, but see also c) c) once we get the estimate, keeping it fixed might not be what we want - even if we have slightly off estimate from 1. if we use it as a starting value and then update the base population alternative allele probability by estimating it from inferred individual genotype probabilities for just the founders then we could converge to a better solution - this might make the running time of AlphaPeel longer / we might need more peeling runs - at the moment we effectively use a simple estimate and fix it, so given that estimate we then estimate individual genotype probs - this starting value and convergence thing could actually well work for more than one metafounder too, so there is hope for b) too

The above suggests that we would like to end up in this "correct" state:

Estimate alternative allele probability based on observed genotyped data (using Newton method on genotype probabilities from observed genotypes (something like this), so accounting for genotyping error) --> test how the linear model method with genetic groups could serve us better, but note that even a starting value and updates in the founders could work well, so I suggest we do this linear model method last

Take 1. and set it for the rest of the program execution --> I would like us to explore updating base population allele probability with every round of peeling (we start going down and then up, so when we come up, we have genotype probs for founders and we can estimate allele prob there, even separated by multiple metafounders)

Use 1. to set anterior term for founders (since alternative allele probability is fixed, so are these anterior terms) --> implementing change in 2. means we would update anetrior term for founders every iteration too

Peel down and up a couple of times to propagate observed genotype information across pedigree --> hopefully the above changes would not make the algorithm/runtime much slower (as in, that we would need more iterations)

@gregorgorjanc Thank you for summarizing this! There is just one point I would like to clarify. In steps 2 and 3 of the "correct" state, you mentioned that we would update the estimation of alternative allele probability every peeling cycle, and use the updated allele probability to reestimate the anterior terms. But, we had a conversation about the information contained in the updated alternative allele probability, which is the same as the information contained in the anterior terms after each peeling cycle. If we reestimate anterior terms based on the updated alternative allele probability, it would be the same as the one before the reestimation. So we probably would only do the estimation at the very beginning of the whole peeling process for the peeling accuracy and the reestimation at the very end of the peeling process for the more accurate alternative allele probability output.

@gregorgorjanc Thank you for summarizing this! There is just one point I would like to clarify. In steps 2 and 3 of the "correct" state, you mentioned that we would update the estimation of alternative allele probability every peeling cycle, and use the updated allele probability to reestimate the anterior terms. But, we had a conversation about the information contained in the updated alternative allele probability, which is the same as the information contained in the anterior terms after each peeling cycle. If we reestimate anterior terms based on the updated alternative allele probability, it would be the same as the one before the reestimation. So we probably would only do the estimation at the very beginning of the whole peeling process for the peeling accuracy and the reestimation at the very end of the peeling process for the more accurate alternative allele probability output

@XingerTang right, I keep forgetting that with the addition of metafounders the founders of the new internal pedigree are the metafounders which are “parents” of all our actual founding individuals! Let’s see … so, these metafounders will have anterior, penetrance, and “posterior” terms. When we have a starting allele prob (passed by user or estimated from the data) we should use that for the anterior term of the metafounder(s). Then we peel down and up the pedigree. Once we come up, we will have estimated individual genotype probabilities for the metafounder(s) by combining the anterior and “posterior” terms (the “posterior” term will collect all the information from all descendants of each metafounder) while penetrance will always be unknown for metafounders (unless we have some prior information). These estimated individual genotype probabilities for the metafounder(s) are in fact estimated base population genotype probabilities and we can simply convert these to estimate the base population allele frequency (possibly for more than one metafounder). Having this estimate, we can update the anterior term of the metafounder(s) and repeat peeling down and up. There will be a cycle/loop of information flow so we will have to test how it works in terms of accuracy and runtime till convergence (we might need to add actual convergence metric!). How does this sound?

@gregorgorjanc Sure, it sounds doable.

AlphaGenes / AlphaPeel

Implement alt_allele_prob option #142