AntoineSoetewey / statsandr

A blog on statistics and R aiming at helping academics and professionals working with data to grasp important concepts in statistics and to apply them in R. See www.statsandr.com
http://statsandr.com/
35 stars 16 forks source link

blog/do-my-data-follow-a-normal-distribution-a-note-on-the-most-widely-used-distribution-and-how-to-test-for-normality-in-r/ #19

Closed utterances-bot closed 3 years ago

utterances-bot commented 3 years ago

Do my data follow a normal distribution? A note on the most widely used distribution and how to test for normality in R - Stats and R

This article explains in details what is the normal or Gaussian distribution, its importance in statistics and how to test if your data is normally distributed

https://statsandr.com/blog/do-my-data-follow-a-normal-distribution-a-note-on-the-most-widely-used-distribution-and-how-to-test-for-normality-in-r/

AntoineSoetewey commented 3 years ago

"Comment written by mezzzomix on January 31, 2020 12:08:24:

Hello Antoine and thank you for the post. I think you made a great effort to describe the normal distribution to your readers. However, I strongly disagree with the part towards the end:

"From the output, we see that the p-value > 0.05 implying that the data are not significantly different from a normal distribution. In other words, we can assume the normality."

Not rejecting the null hypothesis does not mean that the alternative hypothesis is true. In order to really "proove" normality, you would have to do an equivalence test and make certain assumptions. The only thing you can show with SW or KS test is that your data indeed does NOT follow a normal distribution.

Best, Ivan

AntoineSoetewey commented 3 years ago

"Comment written by mezzzomix on January 31, 2020 12:08:24:

Hello Antoine and thank you for the post. I think you made a great effort to describe the normal distribution to your readers. However, I strongly disagree with the part towards the end:

"From the output, we see that the p-value > 0.05 implying that the data are not significantly different from a normal distribution. In other words, we can assume the normality."

Not rejecting the null hypothesis does not mean that the alternative hypothesis is true. In order to really "proove" normality, you would have to do an equivalence test and make certain assumptions. The only thing you can show with SW or KS test is that your data indeed does NOT follow a normal distribution.

Best, Ivan

Comment written by Antoine Soetewey on January 31, 2020 13:26:22:

Dear Ivan,

Thanks a lot for your remark. You are completely right, not rejecting the null hypothesis does not mean that the null hypothesis is true (I guess you meant null hypothesis and not alternative hypothesis in your comment).

I was a bit too quick when writing that part. I edited the article according to your comment. Thanks again!

Regards, 
Antoine

AntoineSoetewey commented 3 years ago

Comment written by Duleep Samuel on February 01, 2020 01:32:14:

Lovely write-up. Thanks

AntoineSoetewey commented 3 years ago

Comment written by Duleep Samuel on February 01, 2020 01:32:14:

Lovely write-up. Thanks

Comment written by Antoine Soetewey on February 01, 2020 06:02:18:

Glad you liked it Samuel!

AntoineSoetewey commented 3 years ago

Comment written by SFdisqus on February 04, 2020 18:43:38:

Excellent post, Antoine! Mainly, because it's so clearly written  with easy step by step R examples. I really learned a lot from your post... Merci /  Dank Je / Thanks!

PS:  It would be really nice  if you could write a post  on how to determine  the distribution   of a specific  dataset. Is the data distribution:    Normal? Poisson? Gamma?,  etc. Never found an easy, practical  and clear way how to do that in R...

Antoine, maybe you can try your magic touch on this topic... :-)

AntoineSoetewey commented 3 years ago

Comment written by SFdisqus on February 04, 2020 18:43:38:

Excellent post, Antoine! Mainly, because it's so clearly written  with easy step by step R examples. I really learned a lot from your post... Merci /  Dank Je / Thanks!

PS:  It would be really nice  if you could write a post  on how to determine  the distribution   of a specific  dataset. Is the data distribution:    Normal? Poisson? Gamma?,  etc. Never found an easy, practical  and clear way how to do that in R...

Antoine, maybe you can try your magic touch on this topic... :-)

Comment written by Antoine Soetewey on February 04, 2020 19:25:48:

Glad you liked it!

I take note of your request about the other distributions.

AntoineSoetewey commented 3 years ago

Comment written by SFdisqus on February 04, 2020 18:43:38: Excellent post, Antoine! Mainly, because it's so clearly written  with easy step by step R examples. I really learned a lot from your post... Merci /  Dank Je / Thanks! PS:  It would be really nice  if you could write a post  on how to determine  the distribution   of a specific  dataset. Is the data distribution:    Normal? Poisson? Gamma?,  etc. Never found an easy, practical  and clear way how to do that in R... Antoine, maybe you can try your magic touch on this topic... :-)

Comment written by Antoine Soetewey on February 04, 2020 19:25:48:

Glad you liked it!

I take note of your request about the other distributions.

Comment written by Antoine Soetewey on May 14, 2020 06:42:23:

Hello,

For your information I just published an article that may be of interest to you. In this section, I show how to test whether your data follows a binomial distribution. This example can easily be adapted to other distributions as Poisson, etc.

Feel free to let me know if you have any questions!

AntoineSoetewey commented 3 years ago

Comment written by vijayarajamanickam on December 17, 2020 12:17:05:

Dear Antoine, It really a informative post...Thank you so much..

It would be really helpful, if you write a post  on log transformation of data or any other transformation. Because some Data (350 individual) is not following the normal distribution.

kind regards
vijay

AntoineSoetewey commented 3 years ago

Comment written by vijayarajamanickam on December 17, 2020 12:17:05:

Dear Antoine, It really a informative post...Thank you so much..

It would be really helpful, if you write a post  on log transformation of data or any other transformation. Because some Data (350 individual) is not following the normal distribution.

kind regards vijay

Comment written by Antoine Soetewey on December 17, 2020 22:04:02:

Dear Vijay,

Thanks for the suggestion, it's now on my to do list (but don"t expect it soon because I have a lot of work at the moment).

A sample of 350 is large so increasing the sample size will probably not make your data more normal. What I can already suggest if your data is not following a normal distribution is to try to:

Also, keep in mind that for some analyses such as independent and dependent sample t-tests, ANOVA and regressions), deviations from normality are not always an issue for validity. As long as the sample size exceeds 30, Stevens (2016) showed that there is not usually too much of an impact to validity from non-normal data.

Hope this helps.

Regards,
Antoine

AntoineSoetewey commented 3 years ago

Comment written by vijayarajamanickam on December 17, 2020 12:17:05: Dear Antoine, It really a informative post...Thank you so much.. It would be really helpful, if you write a post  on log transformation of data or any other transformation. Because some Data (350 individual) is not following the normal distribution. kind regards vijay

Comment written by Antoine Soetewey on December 17, 2020 22:04:02:

Dear Vijay,

Thanks for the suggestion, it's now on my to do list (but don"t expect it soon because I have a lot of work at the moment).

A sample of 350 is large so increasing the sample size will probably not make your data more normal. What I can already suggest if your data is not following a normal distribution is to try to:

  • apply a transformation such as the logarithm, square root, Box-Cox or Yeo-Johnson
  • remove outliers
  • use non-parametric tests

Also, keep in mind that for some analyses such as independent and dependent sample t-tests, ANOVA and regressions), deviations from normality are not always an issue for validity. As long as the sample size exceeds 30, Stevens (2016) showed that there is not usually too much of an impact to validity from non-normal data.

Hope this helps.

Regards, Antoine

Comment written by vijayarajamanickam on December 18, 2020 10:31:53:

Thank you so much for you quick response It really helpful..

regards
Vijay