Watts-College / cpp-523-fall-2021

https://watts-college.github.io/cpp-523-fall-2021/
1 stars 3 forks source link

Questions about Lab 3 #8

Open droach7 opened 2 years ago

droach7 commented 2 years ago

I was working with some fellow classmates on Lab 3 and we came across some differences in our interpretations of the questions. I thought I would post them here in hopes that other students or Dr. Schlinkert could help clarify for myself and anyone else who might have been confused.

Re: Question 5-2- "Which would result in a larger standard error associated with caffeine if removed from the model?"

In reviewing the lecture notes from Week 2, we read that both uncorrelated and correlated control variables could result in smaller standard errors when integrated into the model (therefore larger standard errors if removed).

NOTE: the notes said it was typical that SE increased when a correlated control variable is integrated into a model, but that it is dependent on the ratio of var(x), cov(x,y), and the residuals.

We were confused on whether this question wanted us to answer it:

  1. using this conceptual knowledge and identifying which type of control variables stress index and gym time are and their typical impacts on a regression model
  2. by numerically evaluating the degree of overlap of stress index and its potential impact on SE compared to that of gym time? (If so, wouldn't we need to use the data table from this case study in the Unit 2 notes? Or is there a way to evaluate it alone based on the correlation structure reported in the lab?)

Clarification would be greatly appreciated!

BrettMFoster commented 2 years ago

Is this question still open for discussion?

I'll give my take. Based on my logic, I'd think that the control variable that would cause the most model error when removed would be the one currently reducing the model error the most. That would be the variable that is most correlated with the outcome, but not that correlated with the policy variable.

From the notes: "If a control variable is completely uncorrelated with the policy variable and correlated with the outcome, it will not impact the slope but it will reduce the residual component of the deviations, thus moving data points closer to the regression line and improving model fit."

Hope this helps.