AntoineSoetewey / statsandr

A blog on statistics and R aiming at helping academics and professionals working with data to grasp important concepts in statistics and to apply them in R. See www.statsandr.com
http://statsandr.com/
35 stars 16 forks source link

blog/wilcoxon-test-in-r-how-to-compare-2-groups-under-the-non-normality-assumption/ #36

Closed utterances-bot closed 3 years ago

utterances-bot commented 3 years ago

Wilcoxon test in R: how to compare 2 groups under the non-normality assumption - Stats and R

Learn how to do the Wilcoxon test (non-parametric version of the Student's t-test) in R, used to compare 2 groups when the normality assumption is violated

https://statsandr.com/blog/wilcoxon-test-in-r-how-to-compare-2-groups-under-the-non-normality-assumption/

AntoineSoetewey commented 3 years ago

Comment written by Gerald I Cheves on June 08, 2020 01:09:13:

What's the difference between the Shapiro-Wilk normality test and the Kolmogorov test for normality?

AntoineSoetewey commented 3 years ago

Comment written by Gerald I Cheves on June 08, 2020 01:09:13:

What's the difference between the Shapiro-Wilk normality test and the Kolmogorov test for normality?

Comment written by Antoine Soetewey on June 08, 2020 04:09:03:

Good question Gerald.

This article discusses the different normality tests.

Briefly said, Kolmogorov-Smirnov and Shapiro-Wilk tests both have the same hypotheses (H0: data comes from a normal distribution and H1: data does not come from a normal distribution), but Shapiro-Wilk test is less sensitive to extreme values and more powerful than Kolmogorov-Smirnov test.

Hope this helps.

Best,
Antoine

xairigu commented 3 years ago

Hi, ¿why you are not testing for variance and what would be the difference between testing for it or not testing? ¿does it change what is being tested with wilcoxon test?

AntoineSoetewey commented 3 years ago

Hi, ¿why you are not testing for variance and what would be the difference between testing for it or not testing? ¿does it change what is being tested with wilcoxon test?

Dear Xaira,

This is a good question and it is often raised.

Here are 3 good articles discussing the concept of equal variances in Wilcoxon test: 1, 2 & 3.

See for instance in 1: "If the two distributions have a different shape, the Mann-Whitney U test is used to determine whether there are differences in the distributions of your two groups. However, if the two distributions are the same shape, the Mann-Whitney U test is used to determine whether there are differences in the medians of your two groups."

To rephrase it, if you only want to compare the two groups you do not have to test the equality of variances. However, if your goal is to compare medians of the two groups then you will need to make sure that the two distributions have the same shape.

So testing for equality of variances will change your interpretation. In this article I don't compare medians, I only compare the groups. This is the reason I don't test for equality of variances. I have added a note regarding this assumption in this section, so thanks for your question.

For your information, this is equivalent when using Kruskal-Wallis test to compare 3 groups or more (see this footnote in my article about ANOVA): if you only want to compare the groups you do not need homoscedasticity, but if you want to compare the medians this assumption must be met.

Hope this helps.

Regards, Antoine

Cannaxuan commented 3 years ago

Since you want to compare the groups by determining whether there are differences in the distributions of the two groups, how to inteprete ' Alternative = "less" or "greater" '?

AntoineSoetewey commented 3 years ago

Since you want to compare the groups by determining whether there are differences in the distributions of the two groups, how to inteprete ' Alternative = "less" or "greater" '?

Hello,

Thanks for your question.

Indeed in the first place I would like to test whether there are differences in the distribution of the two groups, so I don't specify any alternative and test the following:

However, one may be interested to go further (based on preliminary research or on the research question for instance) by testing whether one group performs better or worse than the other. In this case, the alternative should be specified. If one wants to test whether:

Hope this makes sense; let me know if not.

Regards, Antoine

Cannaxuan commented 3 years ago

The R document cites "if both x and y are given and paired is FALSE, a Wilcoxon rank sum test (equivalent to the Mann-Whitney test) is carried out. In this case, the null hypothesis is that the distributions of x and y differ by a location shift of mu and the alternative is that they differ by some other location shift (and the one-sided alternative "greater" is that x is shifted to the right of y)."

AntoineSoetewey commented 3 years ago

The R document cites "if both x and y are given and paired is FALSE, a Wilcoxon rank sum test (equivalent to the Mann-Whitney test) is carried out. In this case, the null hypothesis is that the distributions of x and y differ by a location shift of mu and the alternative is that they differ by some other location shift (and the one-sided alternative "greater" is that x is shifted to the right of y)."

Thanks for the reference.

alternative "greater" is that x is shifted to the right of y:

if I understand correctly, it seems to me that x is larger than y, which means (in our case) that group 1 (x) performs better than group 2 (y).

Unless you have another interpretation of the documentation? I'd be happy to discuss it.

Regards, Antoine

Cannaxuan commented 3 years ago

The R document cites "if both x and y are given and paired is FALSE, a Wilcoxon rank sum test (equivalent to the Mann-Whitney test) is carried out. In this case, the null hypothesis is that the distributions of x and y differ by a location shift of mu and the alternative is that they differ by some other location shift (and the one-sided alternative "greater" is that x is shifted to the right of y)."

Thanks for the reference.

alternative "greater" is that x is shifted to the right of y:

if I understand correctly, it seems to me that x is larger than y, which means (in our case) that group 1 (x) performs better than group 2 (y).

Unless you have another interpretation of the documentation? I'd be happy to discuss it.

Regards, Antoine

Thanks for your quick reply. I am still a little confused that is "differences in the distribution of the two groups" equivalent to "differences in the mean(mu) of the two groups"? If it is, then I am clear. But if not, according to the R document of "wilcox.test" , no specify "alternative" means " alternative = two.sided"(default). This is for the distribution test or the mean test? Below is the link of the R document of "wilcox.test" for your referrence. https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/wilcox.test

AntoineSoetewey commented 3 years ago

Thanks for your quick reply. I am still a little confused that is "differences in the distribution of the two groups" equivalent to "differences in the mean(mu) of the two groups"?

As far as I understand, the Wilcoxon test is not comparing the means (mu). This is the reason that in this article I wrote:

The Student's t-test is comparing the means:

That being said, both tests allows to compare two groups (with a different process if I may say).

If it is, then I am clear. But if not, according to the R document of "wilcox.test" , no specify "alternative" means " alternative = two.sided"(default). This is for the distribution test or the mean test?

If you don't specify any alternative, it is indeed a two-sided test so you are testing:

But in any case (i.e., with or without specifying an alternative), with the Wilcoxon test you are not using the means, you are rather comparing the distributions of the two groups, so I would not call it a mean test.

Hope this helps. I am not completely sure I understand your question so my apologies if I am not answering it.

Regards, Antoine