edquant / edh7916

Course materials and website for EDH7916: Contemporary Research in Higher Education
https://edquant.github.io/edh7916/
3 stars 1 forks source link

Question 3: X1SES Value #24

Closed jharsell closed 4 years ago

jharsell commented 4 years ago

Hello All,

For question three, I am trying to write my function to replace the missing values for X1SES; however, when I look at the NCES codebook to see what value they use for missing data they use imputed values to replace missing values. Can anyone share what value they used to complete their function?

For funsies, I used -8.000 a value in the codebook to see if I could get the code to run.

My first shot crashed and burned by converting all values in x1ses to NA:

df <- df %>%
    mutate(x1ses = fix_missing(x1ses, -8.0000))

Conversely, I tried this, but none of the values changed to NA:

fix_missing <- function(x, x1ses) {
    x <- ifelse(-8.0000 == x1ses,        
                NA,                     
                x)                      
    return(x)
}

Link to variable description in codebook

Thank you for your patience with my barrage of questions this week!

Jaime

btskinner commented 4 years ago

Missing values for x1ses1

@jharsell, you are right that the codebook isn't clear here. When you look at the PDF codebook, you can see different min/max values that don't include -8:

Screen Shot 2020-02-25 at 8 39 29 AM

The distribution of x1ses (which I didn't show you how to do yet) also strongly suggests that -8 is a missing value.

hist_x1ses

So I would say that -8 means missing here (as it does with other variables).

fix_missing() function

I'm not sure what function you used in your first bit of code, but this function

fix_missing <- function(x, x1ses) {
    x <- ifelse(-8.0000 == x1ses,        
                NA,                     
                x)                      
    return(x)
}

has a couple of issues. First, remember that x is supposed to be the input data vector (data frame column, for example) that we want to change. We can guess that by the fact that we assign (<-) to x and also return x. That's good.

If that's the case, then you've introduced two hard-coded values that won't scale: -8.000 and x1ses. Just like x is the variable that temporarily holds the values we want to change, you need an argument value that will temporarily hold the missing value. In class, we called that argument miss_val. Take another look at fix_missing() in our lesson:

## function to fix missing values
fix_missing <- function(x, miss_val) {
    ## use ifelse(< test >, < do this if TRUE >, < do that if FALSE >)
    x <- ifelse(x %in% miss_val,        # is x == any value in miss_val?
                NA,                     # TRUE: replace with NA
                x)                      # FALSE: return original value as is
    ## return corrected x
    return(x)
}

Do you need to change the underlying function code at all or can you just reuse it for Question 3 by giving its two arguments new values?

btskinner commented 4 years ago

Closing, but reopen if you have more questions.