astheeggeggs / lshmm

code to run Li and Stephens
MIT License
3 stars 3 forks source link

Rework `get_examples_haploid` and `get_examples_diploid` #109

Closed szhan closed 2 weeks ago

szhan commented 2 weeks ago

After simulating a ts and getting a genotype matrix from it, some haplotypes are arbitrarily chosen to be the queries in the testing, and are excluded from the ref. panel that is passed to the tests. When combined with using estimate_mutation_probability, pathological cases can be encountered where the number of haplotypes in the ref. panel is too small.

Instead of excluding the arbitrarily chosen haplotypes from the ref. panel, a better way is to randomly mutate the chosen haplotypes and not exclude any haplotypes from the ref. panel. Also, this way, the number of sample haplotypes in the ref. panel is equal to ts.num_samples, which should simplify the logic in get_examples_pars when ancestors are included.