MEDSL / healthy_elections

A Repository for data related to elections and the COVID-19 pandemic
10 stars 4 forks source link

WI Naming scheme #1

Open jcuriel-unc opened 4 years ago

jcuriel-unc commented 4 years ago

Wisconsin is a strange state in regards to sub-state jurisdictions. First of all, WI has the most election jurisdictions of any states. Their precincts are called wards, and each one can act effectively autonomously. However, that is just the beginning:

  1. Similar names - Wisconsin is a state where many towns, villages, cities, etc. share similar names yet are in very different parts of the state. However, one cannot rely on codes because...

  2. Reliance on unique sub-county fips - the state of WI follows a sub-county naming scheme similar to the New England states, in that town level jurisdictions and the like. However...

  3. Internal coding scheme - WI foregoes the standard US style FIP codes (i.e. 55+ 3 digit county fips + 4 digit town) and instead goes with 2 unique coding schemes within their election data: their internal FIPs (each county is coded 1 - 72 at beginning in alphabetical order), and their tax codes, which make no sense.

  4. Missing data - For the most part, it is difficult to get a full dictionary due to the fact that the data WI reports are full of missing data, so not every jurisdiction is reported

jcuriel-unc commented 4 years ago

For the purpose of matching everything up, it is necessary to download the Wards data from the state, and from there, properly make every MCD_Name upper case. Then the CTV field refers to city, town, or village. Therefore, properly assign each code to the proper upper case name based on indicator. From there, it is possible to match onto other string fields from say, 190 reports. Here is example code for string matching:

wi_wards_rawdf <- wi_wards_raw@data #extracting the data frame from the ward shpfile match_list <- list() #empty list creation town_vec<- sort(unique(wi190$Muni2)) # vector of town names from 190 report countywi_vec <- sort(unique(wi_wards_raw$CNTY_NAME)) # county vector county190vec <- sort(unique(wi190$County)) # county vector from 190 report county190vec <- county190vec[-1] #getting rid of single NA county for(y in 1:72){ wards_county <- subset(wi_wards_rawdf, CNTY_NAME == countywi_vec[y] ) wi190cty <- subset(wi190, County==county190vec[y] ) town_vec <- sort(unique(wi190cty$Muni2)) town_vec_poll <- sort(unique(wards_county$full_name)) for (j in 1:length(town_vec)) { temp_store1 <- town_vec[j] temp_store2 <- town_vec_poll for (v in 1:length(temp_store2)) { temp_results <- agrep(temp_store1, temp_store2[v], max.distance = 0.2) if (length(temp_results) > 0) { match_list[[length(match_list) + 1]] <- c(temp_store1, temp_store2[v]) } } } }