astheeggeggs / lshmm

code to run Li and Stephens
MIT License
3 stars 3 forks source link

Reduce run time for tests for diploid Viterbi implementations #65

Open szhan opened 1 month ago

szhan commented 1 month ago

Some tests for the naive diploid Viterbi implementations are rather slow when the reference panel includes ancestral haplotypes and gets moderately large. For example, the tests took ~40 minutes to complete for PR #64, even when tests are skipped when the reference panel exceeds 100 haplotypes. Should we run tests that take this long?

szhan commented 1 month ago

@jeromekelleher suggested to exclude running the naive implementations, since we will not be using them in the API. Also, we should aim for tests that don't run longer than 30 seconds or so.

szhan commented 3 weeks ago

Naive diploid Viterbi takes rather long when run on ref. panels simulated from get_ts_simple_n8_high_recomb, because the number of haplotypes (including ancestors) can get about 150. A simple solution is to reduce the recombination rate when doing the simulation by half, so reducing it to 10 from 20, which gives ref. panels of about 30 haplotypes.

szhan commented 3 weeks ago

By reducing the recombination rate to simulate ref. panels for get_ts_simple_n8_high_recomb and test_ts_larger and not running naive diploid Viterbi, I got the the total test run time to 12 minutes on my machine, down from 40 minutes.

szhan commented 3 weeks ago

These tests take a long time to run:

szhan commented 2 weeks ago

Ideally, we implement a naive version of diploid Viterbi that scales better, but that will take more work. For now, we can reduce the run times of the ts_larger tests in test_nontree_diploid.py by lowering the (1) ref. panel size; (2) recombination rate; and (3) max number of haplotypes in the ref. panel to decide whether to run naive diploid Viterbi or not.