astheeggeggs / lshmm

code to run Li and Stephens
MIT License
4 stars 3 forks source link

Computing `r/n` when partial ancestral haplotypes are in reference panel #78

Closed szhan closed 3 months ago

szhan commented 3 months ago

When the reference panel contains partial ancestral haplotypes, the denominator of r/n at site i should exclude counting the haplotypes that have NONCOPY at site i. The various implementations have to be modified to handle this. Note that n is constant if all the marginal trees are binary (equal to 2*number of ref. haps - 1), but in practice, n can vary due to polytomies in marginal trees.

szhan commented 3 months ago

@astheeggeggs suggested to add a test (probably in check_inputs) to check that the n in r/n is <= 2 * number of ref. haps - 1.

szhan commented 3 months ago

In the case of diploid, I think the n should be <= (2 * number of ref. haps - 1)^2.

szhan commented 3 months ago

Turns out that it is quite tedious to implement the check when we include ancestral haplotypes, because we need to pass the number of ref. sample haplotypes to various functions. Also, the get_ancestral_haplotypes function doesn't generate 2 * number of ref. sample haplotypes - 1 ancestral haplotypes, so the check fails. This should be investigated in a separate issue.