Inference for Moran/Yule model

ben18785 commented 5 years ago

Doing exact inference for the Moran model is really a type of semi-hidden Markov model, where the states are observed at regular intervals, but inbetween the states are unknown.

Naively, one might think you can use the forward-backward algorithm that is typically used to determine the likelihood for discrete-state HMMs. This, however, is intractable since the state space is HUGE. For example, a population size of 10 with 10 unique variants has 10 billion possible discrete states (10^10). Clearly some of these states won't be compatible with the observations, but the space is just too big to prune. Note: The Yule model is even worse than the above due to the increasing population sizes!

I suspect that a better way to do inference in this model would be to do some sort of continuous approximation, and develop a variance-based test to choose between this model and the Wright-Fisher. Basically, we expect WF to have a lower variance than these other models, since Moran/Yule allow for compounding between observations.

A way to implement this test would be an Austin test where you compare simulated trajectories from the multinomial model with the actual. Possibly, these simulations don't need to be done since we know the mean and variance of the multinomial. One could then just compare the actual trajectories with the C.I.s for each variant.

Alternatively, you could fit WF to the data and show that it provides a poor representation. Then if you compare the log-likelihood of simulated data with the actual log-likelihood this provides a test! The idea is that if the histogram of simulated data log-likelihood contains the actual log-likelihood, then it is feasible that this process generated the data; if not, then fitting a WF model provides a poor fit.

@Armand1 Thoughts? May be a nice inference follow up paper. Wouldn't take long to test the idea and would be nice to say, for a population like BCI or baby names, that WF fits the data better/worse than Moran/Yule.

ben18785 commented 5 years ago

This method is helped by the fact that MLE of the model is instantaneous using Stan's optimizing function, that basically provides maximum likelihood estimates.

Armand1 commented 5 years ago

I don't grasp what you're getting at here.

1) I am, for the moment, and for this paper, perfectly happy with WF only for the time-series test. It shows proof of principle and doesn't have to be extended to Moran or Yule. We can add a line or two discussing the complications of doing so.

2) You seem to asking the question: given an observed set of data, is it better described by a Moran or WF model? That's interesting --- but it's not a question that we (or most other people) ask. Rather, they assume from a priori grounds that one or the other is suitable and test for selection. Do I understand you correctly?

3) I don't understand your claim that Moran is difficult. You have already successfully implemented a TS test for the BCI data --- and is that not a type of Moran model? At least, I am certain that you did not assume, W-F style, that the whole population of trees is replaced every 5 years.

Clarify?

ben18785 / Selection_simulations

Inference for Moran/Yule model #6