TheEconomist / us-potus-model

Code for a dynamic multilevel Bayesian model to predict US presidential elections. Written in R and Stan.
https://projects.economist.com/us-2020-forecast/president
MIT License
1.23k stars 185 forks source link

stringsAsFactors = TRUE prevents use of state polling data by stan models fit to 2008, 2012, and 2016 #21

Closed fog-pine closed 3 years ago

fog-pine commented 3 years ago

I'd like to report an easy-to-make but consequential bug. It seems the bug prevents use of all state poll data during fitting of stan models to 2008, 2012, and 2016. I'm unsure of other effects.

Like many bugs in R scripts, this one stems from read.csv having the default option stringsAsFactors = TRUE.

Bug locations

All entries of df$index_s are reassigned value NA at

Bug appearance

All instances look like index_s = as.numeric(factor(as.character(state), levels = c('--',state_abb_list)))

Reason for bug

Because read.csv defaults to stringsAsFactors=TRUE, state_abb_list has class factor, not the anticipated class character. As a result c('--', state_abb_list) returns c('--', '1', '2', '3', ...), not the anticipated c('--', 'AK', 'AL', 'AR', 'AZ',...).

A small additional point: It looks like states are misordered in the comment directly above the bug in the R scripts.

fog-pine commented 3 years ago

I'm closing this because the bug doesn't appear for R 4.0.0 -- it only affects earlier R versions. Beginning with R 4.0.0, stringsAsFactors defaults to FALSE. Thanks to anon e mouse for pointing out the change in R to me.

If anyone else runs into the same problem, please switch to R 4.0.0 to run the Economist model code.