stringsAsFactors = TRUE prevents use of state polling data by stan models fit to 2008, 2012, and 2016

I'd like to report an easy-to-make but consequential bug. It seems the bug prevents use of all state poll data during fitting of stan models to 2008, 2012, and 2016. I'm unsure of other effects.

Like many bugs in R scripts, this one stems from read.csv having the default option stringsAsFactors = TRUE.

Bug locations

All entries of df$index_s are reassigned value NA at

Lines 137-8 of final_2008.R
Lines 136-7 of final_2012.R
Lines 135-6 of final_2016.R

Bug appearance

All instances look like index_s = as.numeric(factor(as.character(state), levels = c('--',state_abb_list)))

Reason for bug

Because read.csv defaults to stringsAsFactors=TRUE, state_abb_list has class factor, not the anticipated class character. As a result c('--', state_abb_list) returns c('--', '1', '2', '3', ...), not the anticipated c('--', 'AK', 'AL', 'AR', 'AZ',...).

A small additional point: It looks like states are misordered in the comment directly above the bug in the R scripts.

TheEconomist / us-potus-model

stringsAsFactors = TRUE prevents use of state polling data by stan models fit to 2008, 2012, and 2016 #21