Open ecking opened 4 years ago
Start here:
Which of these control variables will increase the slope b1 associated with the policy variables X1?
Which will have no impact on the slope?
# Original model:
Y = b0 + b1(X1) + e
# Model with control added:
Y = b0 + b1(X1) + b2(X2) + e
Y = b0 + b1(X1) + b3(X3) + e
Y = b0 + b1(X1) + b4(X4) + e
# policy slope:
b1 = cov(X1,Y) / var(X1)
There will be a greater impact on slope when a control variable has a lower standard error. Is that correct?
NOPE... work through above first though
Lol “NOPE.”😆 yeah I just reread what I wrote. My bad. That’s clearly not right. Let me regather my thoughts as I don’t think that was exactly what I was trying to get at. I think I do have an understanding but my brain gets stuck on what’s being “removed”. I don’t know why. 😆
The common phrase is "holding control variables constant" but I find that to be more confusing when you try to operationalize it.
What is meant by "removes" is that to determine the policy variable slope and standard error after adding controls, you literally delete all of the variance associated with the control, then focus on the remaining portions.
Regressions with control variables only use the independent portions of X1 and Y (those uncorrelated with the controls) to generate the slopes you see in the table. So the final slope is calculated after "removing" the variance of the controls.
Hi there, I’m confused again.
So I know how to read the Venn diagrams. Those makes sense. I understand uncontrolled and controlled effects on slope and SE.
What I’m not quite getting is how to create those Venn diagrams based on the regression table output. Or necessarily how the other x coefficients impact the Coefficient of our policy variable.
Like in the lecture notes, When looking at teacher quality, how do we know this is the uncontrolled variable?
The end of Lab 2 has some notes on creating the venn diagrams. You use the correlation matrix, not regression tables:
In reality the diagrams become less helpful once you have more than a couple of variables, but the intuition remains. First remove all of the variance from control variables, then interpret the remaining independent components of X1 and Y.
When looking at teacher quality, how do we know this is the uncontrolled variable?
I am not sure if I totally understand your question, but in most of the labs you will compare two versions of the model, one with the control present, and one with it missing.
If you are missing variables the model is "naive" because it is potentially biased and the standard errors are probably not correct.
When all controls are added it is the "full" or "true" model (you will never have a full model in reality).
Naive: Y = b0 + b1(X1) + e1 Full: Y = B0 + B1(X1) + B2(X2) + e2
Note that uppercase or greek letters mean the slopes are correct (all info is accounted for), and lowercase signifies that there are omitted variables.
This semester you only need to be able to explain how these models would differ. This week you ADD controls so you go from a naive to a full model. Next week you will OMIT variables so you go from the full to the naive model.
When using regression you should be able to move in both directions and reason through the impact of variables without having to add them and run the regressions. This is important with true omitted variables because you won't have them in your dataset, but you can still reason through how the model would change if you did include them simply by knowing something about their assumed relationship with other variables in the model (often observed in previous research).
Hope that helps explain where these exercises are going.
Hi Dr. Lecy,
I have a question from an earlier post.
Regressions with control variables only use the independent portions of X1 and Y (those uncorrelated with the controls) to generate the slopes you see in the table. So the final slope is calculated after "removing" the variance of the controls.
Can you please show which parts of the diagrams you are referring to when you say "independent portions of X1 and Y", "slopes", and "the variance of controls"?
Thank you, Archana
The independent portions share no covariance with the controls. So in the diagrams they are the lowercase letters after controls are added (or uppercase letters if they are not impacted by the controls):
When you walk through the math for omitted variable bias, the calculations for direct effects (independent portion) and indirect effects (dependent portion) show the same information, but using algebra instead of the Venn diagrams. It is more precise.
Hello,
Repeating just so I have an understanding. Can you confirm if I'm correct?
In the example below, teacher quality is uncorrelated to the policy variable. We know this because the residuals are smaller/removed, leaving a larger explained SS. Therefore slope will not change and our standard error will become smaller. That being said, if we were asking the question which variable in this correlation chart would we expect to change the slope of the policy variable if REMOVED from the model, we would say teacher quality as that would leave us with the control variable social economic status which highly correlated with the model.
Hope I'm not giving too much of the lab away. But I need to make sure I understand this. haha. Am I following this right?
I think you have it backwards. If you add TQ this is how the slope of the policy variable changes:
What happens to if you now remove TQ (change the direction of the arrow)?
Alternatively, this is what happens to the SE of the policy variable when adding TQ:
What would happen to the SE if we remove TQ from the model (change the direction of the arrow)?
Hmm so my statement of removing teacher quality is not correct?
but in this example, we see that having teacher quality here keeps slope the same as expected . Once we removed TQ and only kept SES we see that slope was changed more dramatically.
The questions regarding adding controls or omitted variables you will change only one variable at a time. It's too complicated to go from no controls to lots of controls, or lots of controls to no controls. There will be a bunch of things happening at once.
So in the table you would compare model 1 to model 2:
Y = b0 + b1(X1)
Y = b0 + b1(X1) + b2(X2)
Or model 1 to model 4:
Y = b0 + b1(X1)
Y = b0 + b1(X1) + b3(X3)
You are comparing model 2 to model 4:
Y = b0 + b1(X1) + b2(X2)
Y = b0 + b1(X1) + b3(X3)
Since you are looking at two different things at once (adding X3 and dropping X2 or vice-versa) it's hard to tell why the changes are occurring.
The goal of the exercise is to be able to look at a correlation table and know exactly which variables will most improve your model, or based upon previous research that reports correlation direction and strength for a variable you don't have in your study describe how your model would change if your study had collected data on the variable and you included it in your model.
Thanks for your assistance. I think I finally get it now. Took awhile. Thanks!
Hello,
The more I think about this, the more confused I get. I think the word "removed" from lab 3 is throwing me off. ha!
There will be a greater impact on slope when a control variable has a lower standard error. Is that correct?