AlisonLanski / IPEDSuploadables

Producing uploadable txt files for IPEDS reporting, one submission at a time
https://alisonlanski.github.io/IPEDSuploadables/
Other
8 stars 5 forks source link

Fall Enrollment 2020-2021 #37

Closed shilohfling closed 2 years ago

shilohfling commented 3 years ago

break everything out as boolean columns, like IS_FULLTIME 1/0 for part A dataset

shilohfling commented 3 years ago

we need to look at the "state" field and discuss how we want the users to input that data. currently, i have it as 2 letter state abbreviation, but have the U.S. territories and other countries written out:

    dplyr::mutate(Line = dplyr::recode(.data$State,
                                       "AL" = "01",
                                       "AK" = "02",
                                       "AZ" = "04",
                                       "AR" = "05",
                                       "CA" = "06",
                                       "CO" = "08",
                                       "CT" = "09",
                                       "DE" = "10",
                                       "DC" = "11",
                                       "FL" = "12",
                                       "GA" = "13",
                                       "HI" = "15",
                                       "ID" = "16",
                                       "IL" = "17",
                                       "IN" = "18",
                                       "IA" = "19",
                                       "KS" = "20",
                                       "KY" = "21",
                                       "LA" = "22",
                                       "ME" = "23",
                                       "MD" = "24",
                                       "MA" = "25",
                                       "MI" = "26",
                                       "MN" = "27",
                                       "MS" = "28",
                                       "MO" = "29",
                                       "MT" = "30",
                                       "NE" = "31",
                                       "NV" = "32",
                                       "NH" = "33",
                                       "NJ" = "34",
                                       "NM" = "35",
                                       "NY" = "36",
                                       "NC" = "37",
                                       "ND" = "38",
                                       "OH" = "39",
                                       "OK" = "40",
                                       "OR" = "41",
                                       "PA" = "42",
                                       "RI" = "44",
                                       "SC" = "45",
                                       "SD" = "46",
                                       "TN" = "47",
                                       "TX" = "48",
                                       "UT" = "49",
                                       "VT" = "50",
                                       "VI" = "51",
                                       "WA" = "53",
                                       "WV" = "54",
                                       "WI" = "55",
                                       "WY" = "56",
                                       "Unknown" = "57",
                                       "American Samoa" = "60",
                                       "Federated States of Micronesia" = "64",
                                       "Guam" = "66",
                                       "Marshall Islands" = "68",
                                       "Northern Marianas" = "69",
                                       "Palau" = "70",
                                       "Puerto Rico" = "72",
                                       "Virgin Islands" = "78",
                                       "Foreign Countries" = "90"
                                       )
shilohfling commented 3 years ago

on the same note as the states, should we recode the race ethnicity for the user or should we expect them to have it already in numerical values? i'm considering making a recode function that we could use throughout the package, assuming the numbers are consistent throughout.

current situation based on UNLV coding

RaceEthnicity = dplyr::recode(.data$RaceEthnicity,
                                                "NONRS" = 1,
                                                "HISPA" = 2,
                                                "AIAKN" = 3,
                                                "ASIAN" = 4,
                                                "BLACK" = 5,
                                                "PACIF" = 6,
                                                "WHITE" = 7,
                                                "MULTI" = 8,
                                                "UNKWN" = 9
                                                )