astheeggeggs / lshmm

code to run Li and Stephens
MIT License
3 stars 3 forks source link

Refactor checks in `api.py` #19

Closed szhan closed 1 year ago

szhan commented 1 year ago

Fixes #18

szhan commented 1 year ago

I don't think this repo is set up yet to request reviewers for PRs. @astheeggeggs Please take a look when you got a moment! Thanks.

szhan commented 1 year ago

Fixed problem causing failed lint test due to unused variable k.

szhan commented 1 year ago

CI tests are failing when:

astheeggeggs commented 1 year ago

You're right re: mutation_rate - good spot. The set emission probabilities function should then set the mutation rate based on approximation I grabbed from L&S 2003. We could switch that to match exactly the BEAGLE approx. We can do a similar thing with the switch probability vector.

re: your second point. I don't think so. I think we can do one of two things - either allow for the user to provide a haploid panel and diploid queries, throw a warning and convert the ref panel to be all pairs of the haploid panel passed etc, or just let it error out and say that the ploidy must match.

astheeggeggs commented 1 year ago

I know what the problem is - we currently don't deal with multi-allelics correctly in the diploid case. When you just consider biallelic variants, you can cheat by collapsing the query sequence to a vector of {0,1,2}s. So, the ploidy isn't checked. We deal with multi-allelics in the haploid case using characters, but a bit more care is needed in the diploid case. The upshot is, that you can remove the query ploidy check - since it'll currently always fail in the diploid case (as we don't take unphased haploid vector pairs of chars, but a sum of (assumed to be biallelic) unphased haploid pairs).