DS4PS / cpp-523-spr-2020

Course shell for CPP 523 Foundations of Program Evaluation I for Spring 2020.
https://ds4ps.org/cpp-523-spr-2020/
3 stars 8 forks source link

Lab 04, Question 1, Part 2 #17

Open RTrehern-ASU opened 4 years ago

RTrehern-ASU commented 4 years ago

Dr. Lecy In response to your previous post "(3) If you feel like more examples would be helpful feel free to request a "code-through" (for examples of how to implement R code) or a worked-through sample problem. Just create a new discussion tab and ask for a concrete example of a specific problem. We are happy to generate this content."

I am stuck on Question #1, Part 2 of Lab 04. Could you please provide a similar "worked through" sample that demonstrates Auxiliary Regression and shows how we determine the coefficients a1 and B2? If I understand correctly from the lecture notes, it seems the bias should match what we calculated in part 1, but I am not able to get the same result.

Any additional instruction, notes, lecture material, videos, etc. you could provide will be helpful to me. Thank you.

lecy commented 4 years ago

Sure, give me 20 minutes to get back on a computer with a camera.

RTrehern-ASU commented 4 years ago

Thank you Dr. Lecy. I'll follow back up with you tomorrow.

lecy commented 4 years ago

Sure, I'm waiting for the rendered video to arrive. Will update you when it does.

RTrehern-ASU commented 4 years ago

No worries. I know that can take some time.

lecy commented 4 years ago

VIDEO

If you want to run the code yourself: PDF

URL <- "https://raw.githubusercontent.com/DS4PS/cpp-523-fall-2019/master/labs/class-size-seed-1234.csv"
dat <- read.csv( URL )

# naive regression in the example: TS = b0 + b1*CS
m.naive <- lm( test ~ csize, data=dat  )
summary( m.naive )

# Coefficients:
# ----------------------------------
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept) 738.3366     4.8788  151.34   <2e-16 ***
# csize        -4.2221     0.1761  -23.98   <2e-16 ***
# ----------------------------------

# full regression: TS = B0 + B1*CS + B2*SES
m.full <- lm( test ~ csize + ses, data=dat  )
summary( m.full )

# Coefficients:
# ----------------------------------
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept)  665.289     76.574   8.688   <2e-16 ***
# csize         -2.671      1.632  -1.637    0.102    
# ses           16.344     17.098   0.956    0.339    
# ----------------------------------

# auxiliary regression to get a1:  SES = a0 + a1*CS
m.auxiliary <- lm( ses ~ csize, data=dat )
summary( m.auxiliary )

# lm(formula = ses ~ csize, data = dat)
# Coefficients:
# ----------------------------------
#              Estimate Std. Error t value Pr(>|t|)    
# (Intercept)  4.469458   0.009033   494.8   <2e-16 ***
# csize       -0.094876   0.000326  -291.0   <2e-16 ***
# ----------------------------------

# b1 = B1 + bias
# b1 - B1 = bias
b1 <- -4.22
B1 <- -2.67
b1 - B1

# bias = a1*B2
a1 <- -0.0949
B2 <- 16.34
a1*B2

You can test your understanding by doing the same calculations with models 1 and 2 and the equations below:

TS = B0 + B1(CS) + B2(TQ)     # full model (B1 is true slope)
TS = b0 + b1(CS)     # naive model (includes ovb)

image

image

lecy commented 4 years ago

https://ds4ps.org/cpp-523-spr-2020/lectures/walk-through/omitted-variable-bias-example.html