UrbanInstitute / education-data-package-r

https://urbaninstitute.github.io/education-data-package-r/
Other
86 stars 11 forks source link

"class_level" issue in year 2000 IPEDS college data #93

Open robertv4311 opened 2 years ago

robertv4311 commented 2 years ago

If you run this code (or similar), you get no data for the year 2000. The issue appears to be that the class_level "Total" (99) is missing for that year, and I can't figure out any reason it should be. Not sure if this is an issue with the package or with the underlying data it draws from, so apologies if I'm bringing this to the wrong people.

test <- get_education_data(level = "college-university", source = "ipeds", topic = "fall-enrollment", subtopic = c("race", "sex"), filters = list(level_of_study = 'undergraduate', degree_seeking = 1, ftpt = 1, sex = 99, year = 1999:2001, class_level = 99), csv = T, add_labels = T)

summary(factor(test$year))

1999 2001 48888 49368

When I remove the class_level filter and check the breakdown for that variable separately for each of the three years, this is what I get. It seems like the "Total" numbers perhaps ended up in the "Other (total)" category and the latter just went missing, but that's just a guess.

> summary(subset(test, year == 1999)$class_level)
                                     First-time Other (transfer-ins or non-first-time-entering) 
                                          47616                                               0 
                             Other (continuing)                                   Other (total) 
                                              0                                           36136 
                                          Total                            Missing/not reported 
                                          48888                                               0 
                                 Not applicable                                 Suppressed data 
                                              0                                               0 
> summary(subset(test, year == 2000)$class_level)
                                     First-time Other (transfer-ins or non-first-time-entering) 
                                          48056                                               0 
                             Other (continuing)                                   Other (total) 
                                              0                                           49144 
                                          Total                            Missing/not reported 
                                              0                                               0 
                                 Not applicable                                 Suppressed data 
                                              0                                               0 
> summary(subset(test, year == 2001)$class_level)
                                     First-time Other (transfer-ins or non-first-time-entering) 
                                          48360                                               0 
                             Other (continuing)                                   Other (total) 
                                              0                                           31512 
                                          Total                            Missing/not reported 
                                          49368                                               0 
                                 Not applicable                                 Suppressed data 
                                              0                                               0 
erika-tyagi commented 2 years ago

Hi @robertv4311 – apologies for the delay, and thanks for flagging! The issue is indeed with the underlying API that the package pulls from (there aren't rows for the combination of degree_seeking and class_level for 2000). I flagged this with the research team that processes the underlying data, and I'll update here if we're able to add the summary rows in a future data release. Thanks for your patience!

robertv4311 commented 2 years ago

Thanks!

robertv4311 commented 2 years ago

@erika-tyagi This is a possible explanation: "IPEDS did not include the ‘degree- or certificate-seeking’ qualifier in its 2000 survey,” per this study (which excluded 2000 for that reason): https://cepa.stanford.edu/content/race-and-stratification-colege-enrollment-over-time