Watts-College / cpp-523-fall-2021

https://watts-college.github.io/cpp-523-fall-2021/
1 stars 3 forks source link

Lingering questions from Week 2 Review #6

Open dholford opened 2 years ago

dholford commented 2 years ago

Hello,

After doing the readings, completing the lab, and watching the weekly review video I had a few lingering questions:

  1. I know this was discussed a bit during the review session, but I'm still a little confused about how the confidence interval for a 99% confidence level could be smaller than the confidence interval at a 95% confidence level. The value for the test statistic at 95% is 1.96, but at 99% it's 2.56. Since I multiply the SE by the test statistic, If I'm multiplying by a larger value, I'd end up adding and subtracting a larger value from my slope estimate and ultimately end up with a broader range for my confidence interval?

95% 3 + 1(1.96) = 4.96 3 - 1(1.96) = 1.04

99% 3 + 1(2.56) = 5.56 3 - 1(2.56) = .44

  1. For the three methods for decreasing the SE, I'm a little confused about what increasing the variance of X looks like. In the review the example was changing the dose from 100mg to 200mg. Can you just pick any new value for X? Isn't the variance of X determined by the data?

Thanks for any and all help!

BrettMFoster commented 2 years ago

While I haven't finished all of the material yet, I think I can help with the first question. I think steps are missing from the formula. So, some assumptions on my part: b1 = 3.00; se = 1.00; t = 1.96? The issue, I think, is that "t" statistic is being confused for the alpha level--the alpha level is the probability statistic that is used when calculating t.

It's correct that 1.96 is equivalent to an alpha level of 5% or .05. However, the t value is calculated as t=(.05*DF) (DF stands for degrees of freedom). The DFs are missing from the numbers provided. The degrees of freedom are calculated as the count of all X (aka sample size/n-size) subtracted by 2.00. Let's say x=1000, then the DF is 998.

Putting all of this together: t=(.05)998, or, t=49.9. The confidence interval is 49.9SE(1.00) or ±49.9. With 99%, t=(.01)998, or ±9.98. So, the alpha of .01 does provide a more conservative number. Then map b1±CI or 3.00±49.9. As an aside, because of the amount of slope error, this result is probably not significant.

dholford commented 2 years ago

Hey Brett,

This is super helpful, thanks! I think I got confused since in our lab the t score given is 1.96 which is also the z score at 95%, so I was conflating the t score with the z score. And with the z score what I'm saying would be the case, the 99% confidence level would mean for a larger confidence interval - as demonstrated in my initial question. But it was wrong of me to carry that logic over to the t score. Is that right?

Thanks, Dylan

BrettMFoster commented 2 years ago

I haven't read everything yet, so I might be making some faulty assumptions.

I don't think conflating t and z is the issue after all. In fact, I think I was making the same error in my post because I had forgotten there was a t-statistic and t-distribution, until you mentioned z scores. The issue is the missing degrees of freedom in the calculation. Without them, your calculation isn't relative to a sample size, which is why the calculation doesn't work. In your scenario, the total observations would be DF=0 (or no one is in the sample size), but even a negative n value doesn't make sense because a trend can't be based on just one observation.

Try this example using Excel. The steps will probably be clearer: https://people.stfx.ca/bliengme/exceltips/regressionslopeconfidence.htm

Hope this helps.

Schlinkert commented 2 years ago

Thank you for sharing this resource, Brett. Here is another example/resource for this question. I apologize if what I said in the review session was unclear. The 99% CI will give you more confidence, which will actually make the confidence interval bigger if you are working with the same data.

BrettMFoster commented 2 years ago

Thank you!

I think I understand now, and I see where Dylan's question was coming from.

I didn't understand the slope formula that was used in R-studio. And my assumptions above were not applicable to the slope calculation, but would work on a small sample without a known standard deviation. However, slopes do have known values. The sample mean is the b1, the population is the one lonely measurement used, and the standard error is calculated from the b1. I think my above calculation should be amended as b1=±Z(SE/squareroot(n)). Z=1.96; SE=1; and N=1.

So b1±1.96 or with the more conservative alpha level, then b1±2.56. The larger CI is more conservative because it's more likely to reject the null hypothesis.

Brett