Closed junpenglao closed 8 years ago
Thanks for the pointer, this notebook is great!
We haven't done any comparisons ourselves, since we're not doing any modeling outside of PyMC3. One of the hopes of this project is that other people will help contribute other backends, so that the same model could be fit in multiple packages using the same interface. That would then make it very easy to add comparison tools that address precisely this kind of issue.
I've done a handful of comparisons and verified that bambi w/ default priors seemed to agree well with lme4 in those cases, but I haven't done anything systematic. One thought is that, unlike for models without random effects, where we would expect very good agreement between the Bayesian and OLS regressions, it's not clear in general how much agreement we should expect once we add random effects. The reason (I think) is that unless you have large numbers of clusters for all your random factors, the random effect variances can be quite hard to estimate precisely, and it seems likely that likelihood-optimization and MCMC sampling would generally start to diverge in their estimates of the random effects, which will often in turn influence the fixed effects to some degree. Certainly we expect the results of the two approaches to not be wildly different, but I suspect that in many cases they will be a bit different, and that this is probably a general feature of Bayesian vs. ML estimation in mixed models.
@tyarkoni I am glad you find it helpful ;-) Another package call Edward is also following a similar idea regarding using different backends. You might find it interesting.
@jake-westfall That's a very good insight and I agree as well that Bayesian and OLS estimation is not necessarily identical. However, what puzzles me is even when I am directly comparing the coefficient estimated from Stan and PyMC3, the output is still a bit different. Of course, all the outputs display the same pattern. I would like also to add that the model in I used in Stan and PyMC3 is not exactly identical. I did not use the same prior, and I used the MCMC instead of NUTS in PyMC3 (too slow using NUTS). I think with more simulation and more data, it will become clear. So please keep me posted if you have more information on this!
NUTS is much slower but Metropolis often fails to converge, so it's possible that's the issue. If you're willing to wait it out, you could try it again with NUTS and see if you get closer correspondence.
Closing this, since it isn't a Bambi issue per se, but thanks for bringing it up, @junpenglao, and please post an update if you re-run with the NUTS sampler!
Thank you for the suggestion @tyarkoni. I did rerun the model with NUTS. Actually, the problem is not NUTS being too slow - in my case, it's not sampling properly (flat trace or converge to a weird value). And MCMC is likely already converged, I purposely set a very large sampling (100000) and the trace looks OK. Again, it is possible that this is not a common case. One last note, although the model coefficients are not identical, they all fitted the data very good. You can see the plot in the notebook. STAN gives the smallest ||y - y_hat||, but i dont think the advantage is significant.
Just a small update: I tried Bambi on my data, and the output is almost identical to lme4 and STAN. Do you have a paper or reference explaining your implementation? Looking at the model.terms.values()
it seems some of the hyperparameters in the prior is set using an Empirical Bayes approach (e.g., the mu
of the intercept, the sd
for the fixed effect in my model)?
I just added a notebook (still a WIP) that describes the default priors at least for the fixed effects.
The theory/motivation behind the default priors is that they are set up "as if" the priors had been placed on the partial correlation coefficients between the outcome and the predictors. I say "as if" because the priors are not literally placed on the partial correlations -- directly they are placed on the unstandardized regression coefficients -- but the parameters of the priors are chosen so that the implied prior on the partial correlation scale is sensible. Setting a prior on any standardized effect size requires looking at at least some aspect of the data, such as the standard deviation of Y. (Well, it doesn't technically require it, but it's much simpler this way.) So in that sense it is kinda-sorta Empirical Bayesian, although note that we never look directly at an estimate of the thing we're placing a prior on. Note that a similar approach (of having the priors technically depend on some aspect of the data) is taken with some of the default priors in, e.g., the brms and rstanarm packages in R.
I see, this is really helpful! Thank you very much!
Hi there, Great work! I am wondering if you have compared the model estimation result with the output from lme4?
A while ago I tried to compare different approaches to do a simple linear mixed model (a random intercept model with each subject as random effect). While the result using STAN (I used the brms package in R) is almost identical to the output from lme4, I am surprised that the estimation using PyMC3 is always a little bit off.
You can have a look at my attempt here.