Closed shilohfling closed 2 years ago
we need to look at the "state" field and discuss how we want the users to input that data. currently, i have it as 2 letter state abbreviation, but have the U.S. territories and other countries written out:
dplyr::mutate(Line = dplyr::recode(.data$State,
"AL" = "01",
"AK" = "02",
"AZ" = "04",
"AR" = "05",
"CA" = "06",
"CO" = "08",
"CT" = "09",
"DE" = "10",
"DC" = "11",
"FL" = "12",
"GA" = "13",
"HI" = "15",
"ID" = "16",
"IL" = "17",
"IN" = "18",
"IA" = "19",
"KS" = "20",
"KY" = "21",
"LA" = "22",
"ME" = "23",
"MD" = "24",
"MA" = "25",
"MI" = "26",
"MN" = "27",
"MS" = "28",
"MO" = "29",
"MT" = "30",
"NE" = "31",
"NV" = "32",
"NH" = "33",
"NJ" = "34",
"NM" = "35",
"NY" = "36",
"NC" = "37",
"ND" = "38",
"OH" = "39",
"OK" = "40",
"OR" = "41",
"PA" = "42",
"RI" = "44",
"SC" = "45",
"SD" = "46",
"TN" = "47",
"TX" = "48",
"UT" = "49",
"VT" = "50",
"VI" = "51",
"WA" = "53",
"WV" = "54",
"WI" = "55",
"WY" = "56",
"Unknown" = "57",
"American Samoa" = "60",
"Federated States of Micronesia" = "64",
"Guam" = "66",
"Marshall Islands" = "68",
"Northern Marianas" = "69",
"Palau" = "70",
"Puerto Rico" = "72",
"Virgin Islands" = "78",
"Foreign Countries" = "90"
)
on the same note as the states, should we recode the race ethnicity for the user or should we expect them to have it already in numerical values? i'm considering making a recode function that we could use throughout the package, assuming the numbers are consistent throughout.
current situation based on UNLV coding
RaceEthnicity = dplyr::recode(.data$RaceEthnicity,
"NONRS" = 1,
"HISPA" = 2,
"AIAKN" = 3,
"ASIAN" = 4,
"BLACK" = 5,
"PACIF" = 6,
"WHITE" = 7,
"MULTI" = 8,
"UNKWN" = 9
)
break everything out as boolean columns, like IS_FULLTIME 1/0 for part A dataset