Relationship between CI and significance tests

lukasvermeer commented 2 years ago

On this page there is this strong statement of equivalence.

There is a direct relationship between the CI around an effect size and statistical significance of a null-hypothesis significance test. For example, if an effect is statistically significant (p < 0.05) in a two-sided independent t-test with an alpha of .05, the 95% CI for the mean difference between the two groups will not include zero.

I always thought that this approach was usually reasonable in practice, but not true by definition. As in: I thought it was theoretically possible to construct valid confidence intervals which could not be used in the way described here.

A bad example of this might be:

Compute a 5% (yes five, not a typo) confidence interval in the normal way.
Define a 95% confidence interval as the set of two ranges [-infinity, lower_bound_5%] and [upper_bound_5%, +infinity].

Obviously one would never do that, but I think that would be a valid 95% confidence interval, because it "covers the true parameter 95% of the time" (which I think is the only requirement for a CI to be valid). Yet it cannot be used in the way described in the quote.

I would love to be corrected here. Are there any references that explain why the relationship between CI and statistical significance is indeed direct?

Lakens commented 2 years ago

Hi, Lukas! The statement is indeed only true if you compute a CI 'the normal way' as you describe it - like a CI around a mean difference. It is possible, in theory, to contruct different 95% CI which does not correspond to a statistical test. To keep the text readable, I never go into detail in hypothetical edge cases. I feel that's a bit more defensible in a text that provides a formal treatment of these issues, while my textbook has a bit more applied focus. In general, any CI that you will get out of statistical software packages will show this relationship. I say in general, because there are sometimes dozens of ways to compute a CI, some with slightly better coverage, and then the relationship does not formally hold (but this happens at a number after the digit that has little relevance in practice).

lukasvermeer commented 2 years ago

Clear. Fair. Thank you. Good to hear I was not completely wrong all those years. 😅

lukasvermeer commented 2 years ago

I think another more technical consideration might be that CIs might be approximated (for example using Fieller's theorem) rather than exact. Using approximate CIs rather than exact p-values would impact precision.

Lakens / statistical_inferences

Relationship between CI and significance tests #29