UrbanInstitute / education-data-package-r

https://urbaninstitute.github.io/education-data-package-r/
Other
86 stars 11 forks source link

Filtering by `level_of_study = "undergraduate"` returns an empty data frame when `csv` is set to `TRUE` #77

Closed danli-ds closed 3 years ago

danli-ds commented 3 years ago

Hi,

Thank you for the great work!

I tried to download enrollment data by limiting the level of study to undergraduate only. The code below returned a data frame with 0 rows even if I knew for sure that the data exist.

dat <- educationdata::get_education_data(level = "college-university",
                                         source = "ipeds",
                                         topic = "fall-enrollment",
                                         filters = list(year = 2001, 
                                                        level_of_study = "undergraduate",
                                                        class_level = 99),
                                         by = list("race", "sex"),
                                         add_labels = TRUE,
                                         csv = TRUE)

I then set level_of_study to 1 as follows.

dat <- educationdata::get_education_data(level = "college-university",
                                         source = "ipeds",
                                         topic = "fall-enrollment",
                                         filters = list(year = 2001, 
                                                        level_of_study = 1,
                                                        class_level = 99),
                                         by = list("race", "sex"),
                                         add_labels = TRUE,
                                         csv = TRUE)

This time I received an error.

Error: 1 is not a valid level of study. Valid levels are:
    undergraduate
    graduate
    first-professional
    post-baccalaureate
    99

I checked the source code and noticed that the internal function parse_level_of_study invalidates numeric values for level_of_study. I was able to tweak parse_level_of_study by adding numeric values to the valid_study data frame.

valid_study <- list('undergraduate' = c('undergraduate', 'undergrad', '1'),
                      'graduate' = c('graduate', 'grad', '2'),
                      'first-professional' = c('first-professional', '3'),
                      'post-baccalaureate' = c('post-baccalaureate',
                                               'post-bac',
                                               'postbac',
                                               '4'),
                      '99' = c('99', 'total'))

Since level_of_study is stored as a numeric value in the csv files, I was able to retrieve the data I needed using level_of_study = 1. However, level_of_study = "undergraduate" still returned an empty data frame. Could you please fix this bug?

khueyama commented 3 years ago

Hello @danliIDEA!

Thank you for the very detailed and helpful bug report. I've just merged some changes that I believe fixes this issue, but let me know if I missed anything.

danli-ds commented 3 years ago

Thank you so much for the quick response @khueyama! I have tested the code and it works perfectly now. Thanks!