fditraglia / econometrics.blog-comments

repo for utterances comments in econometrics.blog
0 stars 0 forks source link

post/thirty-isn-t-the-magic-number/ #6

Open utterances-bot opened 3 months ago

utterances-bot commented 3 months ago

Thirty isn't the magic number | econometrics.blog

The simplest version of the central limit theorem (CLT) says that if (X_1, \dots, X_n) are iid random variables with mean (\mu) and finite variance (\sigma^2) [ \frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \rightarrow_d N(0,1) ] where (\bar{X}n = \frac{1}{n} \sum{i=1}^n X_i).

https://www.econometrics.blog/post/thirty-isn-t-the-magic-number/

NeveTong commented 3 months ago

In Section 4.11 of the book "Statistics for Engineers and Scientists" by William Cyrus Navidi, there is a discussion about determining when a sample size is sufficiently large, depending on the shape of the distribution.

fditraglia commented 3 months ago

Hi NeveTong:

Thanks for your comment and for pointing out this reference! I like Navidi's Figure 4.24 showing how skewness can affect the quality of the normal approximation. At the same time I'm puzzled by the claim "Empirical evidence suggests that for most populations, a sample size of 30 or more is large enough for the normal approximation to be adequate." Are we supposed to infer that less than 30 would be inadequate? The simulations above show that this isn't the case. But more to the point "most populations" is a little vague. It depends crucially on the kinds of populations you're studying. If you're a researcher working with income distributions, "most populations" from your perspective will be highly skewed. So, again, while I really like the figure I'd still argue that the n = 30 advice comes up short.

If you want to go further with the idea of how skewness matters for the quality of the normal approximation, you might want to check out the Berry-Esseen Theorem. When you know the third moment of a distribution, this theorem allows you to say exactly how large a sample would be required for the normal approximation to work well. Unfortunately using this theorem would require us to know the third moment, which we usually don't and is also hard to estimate precisely. But it's still interesting and helpful to think about. I should probably do a post on it someday!