TheEconomist / us-potus-model

Code for a dynamic multilevel Bayesian model to predict US presidential elections. Written in R and Stan.
https://projects.economist.com/us-2020-forecast/president
MIT License
1.25k stars 191 forks source link

Based on Jackman's work? #3

Closed mansillo closed 4 years ago

mansillo commented 4 years ago

I am really thankful that The Economist has brought onboard Andrew Gelman to model the probability of the likely outcome to the 2020 election. I am struck by the Stan code and its the similarity between Simon Jackman's work on election prediction, namely his state-space model for the 2004 Australian federal election published in the Australian Journal of Political Science and more poignantly for the American case, his modelling efforts for the 2012 Presidential election. Linzer may have published the model implemented in the Journal of the American Statistical Association in 2013 but Jackman implemented it in 2012 for the election and got 50 out of 50 states. Linzer's model as far as I can tell is Jackman's 2006 article with the hierarchical structure in his textbook. The intellectual origins are Jackman's; he published in 2006 the state-space model and spelt out in two chapters of his textbooks state-space modelling of vote intention and exchangeability for state-level predictions with hierarchical models. Linzer was visiting Stanford in 2012-2013, yet only completed his PhD at UCLA in 2008 and had mostly published on latent class analysis until 2012; I wonder how much cross-fertilisation there was. Sure Linzer got the publication while visiting (Jackman at) Stanford, but it seems to me like there was a little healthy competition at Stanford in 2012 worth acknowledging.

From my first inspection of the model, perhaps with the exception of the Cholesky factor decomposition that the Bayesian probabilistic language, Stan, has made far easier to implement over the preceding modelling language, JAGS, the model Gelman has used is basically the same as Jackman's modelling efforts and can be found in chapters 8 and 9 of his textbook, Bayesian Analysis for the Social Sciences (2009). I understand it's hard to improve on something simple and excellent ... but Gelman has his own textbooks - most notably Bayesian Data Analysis (Third Edition; 2013) - with lots of nifty tools and tricks of the trade that I would have loved to have seen expressed in this modelling effort. Gelman is the king of priors, I am pretty sure I'm not alone in wishing he had more artistic/scientific licence to perform something a bit more novel, incorporating for example estimates from Multilevel Regression and Postratification analyses he has worked hard to popularise and made some very cool and important findings about American public opinion (in particular, partisan non-response bias), or employed a cool ass Gaussian process or spline to "exploit a sensitivity–stability trade-off" as "they stabilize estimates and predictions by making fitted models less sensitive to certain details of the data" in the hierarchical model component of the model doing a little shrinking or even have the priors be a little more creative, say with a horseshoe prior to handling the sparsity. Or even, it would have been awesome to see an ensemble of model specifications that average over the models and weighting by their performance but all of this has a cost. I'm willing to accept that the model had to be kept simple since it is being updated daily and incorporating components that explore high dimensional space while theoretically cool to a statistician or political scientist would blow out the time it takes to analyse the data. The practicality of implementation and the questionable value add of a slightly more accurate model makes the trade-off seem appropriate.

elliottmorris commented 4 years ago

Hi @mansillo,

Thanks for your feedback. It sounds like you have a lot of ideas about various ways to improve the model. We would be delighted if you forked the repo and tried some out! Let us know what you find. A BMA approach to the prior (or posterior) could be promising.

Note that we do include a partisan non-response adjustment.

E