DS4PS / cpp-523-fall-2019

Course shell for CPP 523 Foundations of Program Evaluation I for Fall 2019.
http://ds4ps.org/cpp-523-fall-2019/
6 stars 3 forks source link

Final Exam #16

Open lecy opened 5 years ago

lecy commented 5 years ago

Feel free to ask any questions about review questions or the practice exam that you don't understand.

Jigarci3 commented 5 years ago

Hello, I need clarification on one f the 7 sins of regression- Multicollinearity. "The higher the multicollinearity, the smaller B will be, which means larger the standard errors. When standard errors are large the confidence intervals are bigger and it is less likely that the slope will be statistically significant."

What is B? Is it the unexplained variance of X?

Thanks!

lecy commented 5 years ago

B here is just the variance of X1 after the control X2 is removed.

image

lecy commented 5 years ago

@Jigarci3

It's not the unexplained variance of X1 (we use the phrases "explained" and "unexplained" with the DV Y usually). It is the portion that is uncorrelated with X2, so the portion that will remain if both X1 and X2 are included in the regression together.

B becomes very small then, resulting in large standard errors. It is an extreme case of adding a correlated control variable.

If X2 is an important competing hypothesis then it is necessary to include in the model, and if it erases the direct effects of X1 then that helps us understand that X1 might not be the true causal vector (think about the example with drugs, environmental factors, and developmental issues with infants - drugs went away as the main explanation after controlling for environment).

Measurement error is specifically a problem if we accidentally include several variables that are basically measures of the same thing. They then cancel each other out. If you use census data and include poverty and unemployment rates in the same model you will find that neither is significant - it is because they are both measures of very similar constructs and so they cancel each other out in the model and might impact your understanding of the mechanisms as a result. When independent variables are highly-correlated we need to make sure it is meaningful to include them both (if one is a competing hypothesis, for example) or if we are not just deleting a bunch of useful variance by carelessness.

sunaynagoel commented 5 years ago

Professor @lecy Can you please activate the link for final ?

lecy commented 5 years ago

It should be active.

castower commented 5 years ago

Hello all,

I'm a little confused about question 10 on the Practice Exam:

"What is the largest level of confidence you can choose for the confidence interval around the slope estimate for the Experience coefficient before it crosses zero?"

To clarify, are all of the independent variables automatically measured to the highest level possible before they become insignificant?

Specifically, in the example, it does not have any asterisks to indicate that is is significant at the 99% level (.01 significance) or 90% level (.10). Therefore, I am assuming that the fact that Experience only has .3063 level of significance means that the highest level is somewhere around 69%, but want to be sure this is the right reasoning.

Thanks, Courtney

castower commented 5 years ago

One more question, in the bonus questions, is it okay to estimate the pi symbol is equal to 3.14?

Thanks!

lecy commented 5 years ago

That's correct. The p-value tells you the size of a confidence interval you can draw before the CI contains the null (crosses zero).

image

I see why you might get confused on the bonus, but the Greek symbol pi is just another symbol for a slope, like a1, b1, B1, etc. The question is asking you to solve for pi_1 (the slope in that regression model).

JasonSills commented 5 years ago

@lecy - just confirming the due date and time. I have Thursday at 11:59 in an email, but in the assignment and schedule list I'm seeing Saturday at 11:59. I'm wondering if I should submit tomorrow or if I have a weekend day complete. Thanks!

lecy commented 5 years ago

@JasonSills Technically the semester ends Thursday, but grades are due Monday. I wanted to make sure there was at least one weekend day for the projects if people are working full-time.

So yes, I will accept the assignments up until Saturday 11:59pm without penalty.

JasonSills commented 4 years ago

@lecy Question on the practice exam: (5) Name two sins of the Seven Sins that will always increase the standard error of a regression slope. I would answer this as: 1) multicolinearity; and 2) measurement error of the DV. I'm not sure if 2) is correct because it's a subset of measurement error, so measurement error as one of the sins doesn't always increase the SE. Is there another sin I'm missing?

lecy commented 4 years ago

That's correct.

Omitted variable bias, for example, will impact both the residual and var(x1), thus changing the equation from A/B to a/b. But we would have to know rate of change of A->a and B->b because it can actually increase or decrease the standard errors. So it's ambiguous.

Measurement error in the DV and multicollinearity both have unambiguous effects. They always increase the SE's.