AntoineSoetewey / statsandr

A blog on statistics and R aiming at helping academics and professionals working with data to grasp important concepts in statistics and to apply them in R. See www.statsandr.com

http://statsandr.com/

36 stars 16 forks source link

blog/anova-in-r/ #30

Closed utterances-bot closed 3 years ago

utterances-bot commented 3 years ago

ANOVA in R - Stats and R

Learn how to perform an Analysis Of VAriance (ANOVA) in R to compare 3 groups or more. See also how to interpret the results and test the assumptions

https://statsandr.com/blog/anova-in-r/

technocrat commented 3 years ago

Thank you. This was an extremely helpful walk-through.

AntoineSoetewey commented 3 years ago

Thank you. This was an extremely helpful walk-through.

Glad you found it useful!

AntoineSoetewey commented 3 years ago

Comment written by Lukasz on October 13, 2020 07:32:41:

What do you think of using other post-hoc tests, like Fisher's LSD or Scheffe's? Do you use them? When?

AntoineSoetewey commented 3 years ago

Comment written by Lukasz on October 13, 2020 07:32:41:

What do you think of using other post-hoc tests, like Fisher's LSD or Scheffe's? Do you use them? When?

Comment written by Antoine Soetewey on October 13, 2020 09:10:57:

Dear Lukasz,

I personally do not use Fisher's LSD nor Scheffe's test on a regular basis, but not because they are not good.

As far as I know, Scheffe's and Fisher's LSD tests also require the assumption of homogeneity, like Tukey HSD and Dunnett's tests. Furthermore, Fisher's LSD is one of the least conservative and Scheffe's one of the most conservative post-hoc test. It thus makes sense to perform those two post-hoc tests in parallel and then compare the results (which correspond to the two extremes). Or, it also makes sense to use one of them if you want your conclusions to be very conservative or not at all, depending on the context of the analysis.

In practice, however, I tend to prefer using the Tukey HSD test (or Dunnett's if there is a reference group) because it is more conservative than Fisher's LSD but less conservative than Scheffe's test, so it's a bit like a mix of both worlds. And last but not least, it saves me from having to choose between the two, which may be seen as an arbitrary choice for some people.

For a detailed presentation of the different post-hoc tests see Winer, Michels, & Brown (1991).

Hope this helps.

Regards,
Antoine

AntoineSoetewey commented 3 years ago

Comment written by Lukasz on October 13, 2020 07:32:41: What do you think of using other post-hoc tests, like Fisher's LSD or Scheffe's? Do you use them? When?

Comment written by Antoine Soetewey on October 13, 2020 09:10:57:

Dear Lukasz,

I personally do not use Fisher's LSD nor Scheffe's test on a regular basis, but not because they are not good.

As far as I know, Scheffe's and Fisher's LSD tests also require the assumption of homogeneity, like Tukey HSD and Dunnett's tests. Furthermore, Fisher's LSD is one of the least conservative and Scheffe's one of the most conservative post-hoc test. It thus makes sense to perform those two post-hoc tests in parallel and then compare the results (which correspond to the two extremes). Or, it also makes sense to use one of them if you want your conclusions to be very conservative or not at all, depending on the context of the analysis.

In practice, however, I tend to prefer using the Tukey HSD test (or Dunnett's if there is a reference group) because it is more conservative than Fisher's LSD but less conservative than Scheffe's test, so it's a bit like a mix of both worlds. And last but not least, it saves me from having to choose between the two, which may be seen as an arbitrary choice for some people.

For a detailed presentation of the different post-hoc tests see Winer, Michels, & Brown (1991).

Hope this helps.

Regards, Antoine

Comment written by Lukasz on October 13, 2020 09:35:18:

Thank you. My first choice is LSD exatly because it's "liberalism", but I'm not sure if it's a good "strategy". That's why I asked for yours. Thanks again, I'll check a reference you provided!

AntoineSoetewey commented 3 years ago

Comment written by Lukasz on October 13, 2020 07:32:41: What do you think of using other post-hoc tests, like Fisher's LSD or Scheffe's? Do you use them? When?

Comment written by Antoine Soetewey on October 13, 2020 09:10:57: Dear Lukasz, I personally do not use Fisher's LSD nor Scheffe's test on a regular basis, but not because they are not good. As far as I know, Scheffe's and Fisher's LSD tests also require the assumption of homogeneity, like Tukey HSD and Dunnett's tests. Furthermore, Fisher's LSD is one of the least conservative and Scheffe's one of the most conservative post-hoc test. It thus makes sense to perform those two post-hoc tests in parallel and then compare the results (which correspond to the two extremes). Or, it also makes sense to use one of them if you want your conclusions to be very conservative or not at all, depending on the context of the analysis. In practice, however, I tend to prefer using the Tukey HSD test (or Dunnett's if there is a reference group) because it is more conservative than Fisher's LSD but less conservative than Scheffe's test, so it's a bit like a mix of both worlds. And last but not least, it saves me from having to choose between the two, which may be seen as an arbitrary choice for some people. For a detailed presentation of the different post-hoc tests see Winer, Michels, & Brown (1991). Hope this helps. Regards, Antoine

Comment written by Lukasz on October 13, 2020 09:35:18:

Thank you. My first choice is LSD exatly because it's "liberalism", but I'm not sure if it's a good "strategy". That's why I asked for yours. Thanks again, I'll check a reference you provided!

Comment written by Antoine Soetewey on October 13, 2020 09:52:06:

I cannot really tell you whether your strategy is "good" or not because it also depends on the context of your analysis.

However, what I can tell you is the following:

If you use the LSD.test() function in R, note that the default option for the p-value adjustment method is "none". This means that, by default, you will do as many t-tests as there are comparisons.
If you do multiple t-tests, the global significance level increases as the number of t-tests increases, so you will find significant results just by chance (see the probability with as few as 3 t-tests in this section)
So if you want to prevent this issue of multiplicity, you should specify another adjustment method than the default one.

Regards,
Antoine

AntoineSoetewey commented 3 years ago

Comment written by Mattan S. Ben-Shachar on October 13, 2020 11:55:19:

Great post!

It should be noted that in 2-way (and above) things get trickier with the various Type I/II/III SSs (For which I recommend using {afex}, with {emmeans} for any contrasts. Looking forward to a follow up post! (Also, effect sizes for ANOVA can be estimated with {effectsize}>>>)

AntoineSoetewey commented 3 years ago

Comment written by Mattan S. Ben-Shachar on October 13, 2020 11:55:19:

Great post!

It should be noted that in 2-way (and above) things get trickier with the various Type I/II/III SSs (For which I recommend using {afex}, with {emmeans} for any contrasts. Looking forward to a follow up post! (Also, effect sizes for ANOVA can be estimated with {effectsize}>>>)

Comment written by Antoine Soetewey on October 13, 2020 13:42:17:

Dear Mattan,

Thanks for your comment! I'll make sure to check out the packages you mentioned if I write another post on two-way ANOVA.

Regards,
Antoine

AntoineSoetewey commented 3 years ago

Comment written by ajcullum on November 30, 2020 18:28:04:

Thanks for this nice tutorial, Antoine. I appreciate the fact that you're including detailed statistical explanations along with the comments. One concern I did have was with your test for normality. ANOVA assumes that each group is normally distributed, but you're combining all your residuals into one pooled data set before checking normality. I think a danger there is that, potentially at least, you could have something like one left-skewed population and one right-skewed population that when combined produced a roughly normal set of pooled residuals. I run my normality tests by group to look at the pattern within each group separately.

Thanks again for the great resource!

Alistair

AntoineSoetewey commented 3 years ago

Comment written by ajcullum on November 30, 2020 18:28:04:

Thanks for this nice tutorial, Antoine. I appreciate the fact that you're including detailed statistical explanations along with the comments. One concern I did have was with your test for normality. ANOVA assumes that each group is normally distributed, but you're combining all your residuals into one pooled data set before checking normality. I think a danger there is that, potentially at least, you could have something like one left-skewed population and one right-skewed population that when combined produced a roughly normal set of pooled residuals. I run my normality tests by group to look at the pattern within each group separately.

Thanks again for the great resource!

Alistair

Comment written by Antoine Soetewey on November 30, 2020 20:10:53:

Dear Alistair,

Thank you for your question.

You're right in saying that ANOVA assumes that each group is normally distributed. However, saying "The distribution of Y within each group is normally distributed" is the same as saying "The residuals are normally distributed". In this sense, it's not a mistake that I check normality on all residuals.

Remember that residuals are the distance between the actual value of Y and the mean value of Y for a specific value of X, so the grouping variable is induced in the computation of the residuals.

So in summary, in ANOVA you actually have two options for testing normality:

Checking normality separately for each group on the "raw" data (Y values)
Checking normality on the residuals (but not per group)

In practice, you will see that it is often easier to just use the residuals and check them all together, especially if you have many groups or few observations per group.

If you are still not convinced: remember that an ANOVA is a special case of a linear model. Suppose your independent variable is a continuous variable (instead of a categorical variable), the only option you have left is to check normality on the residuals, which is precisely what is done for testing normality in linear regression models.

See more in this article.

(For your information, I have added this clarification at the end of this section.)

Hope this helps.

Regards,
Antoine

AntoineSoetewey commented 3 years ago

Comment written by ajcullum on November 30, 2020 18:28:04: Thanks for this nice tutorial, Antoine. I appreciate the fact that you're including detailed statistical explanations along with the comments. One concern I did have was with your test for normality. ANOVA assumes that each group is normally distributed, but you're combining all your residuals into one pooled data set before checking normality. I think a danger there is that, potentially at least, you could have something like one left-skewed population and one right-skewed population that when combined produced a roughly normal set of pooled residuals. I run my normality tests by group to look at the pattern within each group separately. Thanks again for the great resource! Alistair

Comment written by Antoine Soetewey on November 30, 2020 20:10:53:

Dear Alistair,

Thank you for your question.

You're right in saying that ANOVA assumes that each group is normally distributed. However, saying "The distribution of Y within each group is normally distributed" is the same as saying "The residuals are normally distributed". In this sense, it's not a mistake that I check normality on all residuals.

Remember that residuals are the distance between the actual value of Y and the mean value of Y for a specific value of X, so the grouping variable is induced in the computation of the residuals.

So in summary, in ANOVA you actually have two options for testing normality:

Checking normality separately for each group on the "raw" data (Y values)

Checking normality on the residuals (but not per group)

In practice, you will see that it is often easier to just use the residuals and check them all together, especially if you have many groups or few observations per group.

If you are still not convinced: remember that an ANOVA is a special case of a linear model. Suppose your independent variable is a continuous variable (instead of a categorical variable), the only option you have left is to check normality on the residuals, which is precisely what is done for testing normality in linear regression models.

See more in this article.

(For your information, I have added this clarification at the end of this section.)

Hope this helps.

Regards, Antoine

Comment written by ajcullum on December 01, 2020 21:57:09:

Antoine-

Thanks for the detailed reply. I read it and the linked article carefully, and then ran some simulations to check myself. Having done all that, I still believe my point is valid. I do want to provide an example that illustrates that, but first I wanted to note something.

You wrote: saying "The distribution of Y within each group is normally distributed" is the same as saying "The residuals are normally distributed". This phrasing implies that those two things are always the same, but I'm arguing that this is a case where "X implies Y" doesn't have to mean "Y implies X". So while it is always true that normally-distributed groups will result in normally-distributed pooled residuals, I'm arguing that normally-distributed pooled residuals don't necessarily mean there were normally-distributed groups.

Here's an example in R where group A is right-skewed and group B is left-skewed. Run separately, each group shows a clearly skewed histogram and a very low P value for the Shapiro-Wilks test. But if you run those on the pooled data, the result looks like things are normally-distributed.

# Create the data set. Samples A and B have mirror-image distributions with mean zero.
# A is right skewed, and B is simply -A (negative A) and thus left-skewed.
data <- structure(list(group = c(rep("A", 30), rep("B", 30)),
    value = c(0.45, 0.74, 0.19, -0.76, 0.23, -0.82, -0.44, -0.37,
    -0.24, -0.62, 0.52, 1.7, -0.37, 1.11, -0.41, -0.15, -0.12, -0.59,
    -0.64, -0.35, -0.48, -0.32, 0.91, 0.1, -0.28, 0.16, -0.48, 0.23,
    -0.21, 1.31, -0.45, -0.74, -0.19, 0.76, -0.23, 0.82, 0.44, 0.37,
    0.24, 0.62, -0.52, -1.7, 0.37, -1.11, 0.41, 0.15, 0.12, 0.59,
    0.64, 0.35, 0.48, 0.32, -0.91, -0.1, 0.28, -0.16, 0.48, -0.23,
    0.21, -1.31)), row.names = c(NA, -60L), class = "data.frame")
# Run the ANOVA to generate residuals
res_aov <- aov(value ~ group, data = data)
# Histogram and Shapiro-Wilks test by group
by(res_aov$residuals, res_aov$model$group, hist)
by(res_aov$residuals, res_aov$model$group, shapiro.test)
# Histogram and Shapiro-Wilks test for normality on pooled residuals hist(res_aov$residuals)
shapiro.test(res_aov$residuals)

The (abridged) Shapiro-Wilks output looks like this:
By group:
res_aov$model$group: A
W = 0.90103, p-value = 0.008903
res_aov$model$group: B
W = 0.90103, p-value = 0.008903
Pooled:
W = 0.99165, p-value = 0.9556

Now, I'm not saying this is a likely scenario. Most of the time it would probably be reasonable to assume the groups all have the same-shaped distribution, meaning you could pool the residuals for a normality test. As your linked article suggests, that might be the best way to go if sample sizes within groups are small. And under some circumstances (like regression) it may not be reasonable to do anything other than look at the residuals as a pooled set (although binning and/or a visual inspection of the residual plot would be options). But there's a difference between making those kind of practical decisions and stating that normally-distributed pooled residuals necessarily mean the individual groups are also normally-distributed.

(I'd also note that the article you linked to says "If there really are many values of Y for each value of X (each group), and there really are only a few groups (say, four or fewer), go ahead and check normality separately for each group" which is the case for the penguin data set.)

I hope I'm not offending you by continuing this discussion! I'm certainly happy to consider a rebuttal!

Cheers,
Alistair

AntoineSoetewey commented 3 years ago

Comment written by ajcullum on November 30, 2020 18:28:04: Thanks for this nice tutorial, Antoine. I appreciate the fact that you're including detailed statistical explanations along with the comments. One concern I did have was with your test for normality. ANOVA assumes that each group is normally distributed, but you're combining all your residuals into one pooled data set before checking normality. I think a danger there is that, potentially at least, you could have something like one left-skewed population and one right-skewed population that when combined produced a roughly normal set of pooled residuals. I run my normality tests by group to look at the pattern within each group separately. Thanks again for the great resource! Alistair

Comment written by Antoine Soetewey on November 30, 2020 20:10:53: Dear Alistair, Thank you for your question. You're right in saying that ANOVA assumes that each group is normally distributed. However, saying "The distribution of Y within each group is normally distributed" is the same as saying "The residuals are normally distributed". In this sense, it's not a mistake that I check normality on all residuals. Remember that residuals are the distance between the actual value of Y and the mean value of Y for a specific value of X, so the grouping variable is induced in the computation of the residuals. So in summary, in ANOVA you actually have two options for testing normality:

Checking normality separately for each group on the "raw" data (Y values)

Checking normality on the residuals (but not per group)

In practice, you will see that it is often easier to just use the residuals and check them all together, especially if you have many groups or few observations per group. If you are still not convinced: remember that an ANOVA is a special case of a linear model. Suppose your independent variable is a continuous variable (instead of a categorical variable), the only option you have left is to check normality on the residuals, which is precisely what is done for testing normality in linear regression models. See more in this article. (For your information, I have added this clarification at the end of this section.) Hope this helps. Regards, Antoine

Comment written by ajcullum on December 01, 2020 21:57:09:

Antoine-

Thanks for the detailed reply. I read it and the linked article carefully, and then ran some simulations to check myself. Having done all that, I still believe my point is valid. I do want to provide an example that illustrates that, but first I wanted to note something.

You wrote: saying "The distribution of Y within each group is normally distributed" is the same as saying "The residuals are normally distributed". This phrasing implies that those two things are always the same, but I'm arguing that this is a case where "X implies Y" doesn't have to mean "Y implies X". So while it is always true that normally-distributed groups will result in normally-distributed pooled residuals, I'm arguing that normally-distributed pooled residuals don't necessarily mean there were normally-distributed groups.

Here's an example in R where group A is right-skewed and group B is left-skewed. Run separately, each group shows a clearly skewed histogram and a very low P value for the Shapiro-Wilks test. But if you run those on the pooled data, the result looks like things are normally-distributed.

# Create the data set. Samples A and B have mirror-image distributions with mean zero. # A is right skewed, and B is simply -A (negative A) and thus left-skewed. data <- structure(list(group = c(rep("A", 30), rep("B", 30)), value = c(0.45, 0.74, 0.19, -0.76, 0.23, -0.82, -0.44, -0.37, -0.24, -0.62, 0.52, 1.7, -0.37, 1.11, -0.41, -0.15, -0.12, -0.59, -0.64, -0.35, -0.48, -0.32, 0.91, 0.1, -0.28, 0.16, -0.48, 0.23, -0.21, 1.31, -0.45, -0.74, -0.19, 0.76, -0.23, 0.82, 0.44, 0.37, 0.24, 0.62, -0.52, -1.7, 0.37, -1.11, 0.41, 0.15, 0.12, 0.59, 0.64, 0.35, 0.48, 0.32, -0.91, -0.1, 0.28, -0.16, 0.48, -0.23, 0.21, -1.31)), row.names = c(NA, -60L), class = "data.frame") # Run the ANOVA to generate residuals res_aov <- aov(value ~ group, data = data) # Histogram and Shapiro-Wilks test by group by(res_aov$residuals, res_aov$model$group, hist) by(res_aov$residuals, res_aov$model$group, shapiro.test) # Histogram and Shapiro-Wilks test for normality on pooled residuals hist(res_aov$residuals) shapiro.test(res_aov$residuals)

The (abridged) Shapiro-Wilks output looks like this: By group: res_aov$model$group: A W = 0.90103, p-value = 0.008903 res_aov$model$group: B W = 0.90103, p-value = 0.008903 Pooled: W = 0.99165, p-value = 0.9556

Now, I'm not saying this is a likely scenario. Most of the time it would probably be reasonable to assume the groups all have the same-shaped distribution, meaning you could pool the residuals for a normality test. As your linked article suggests, that might be the best way to go if sample sizes within groups are small. And under some circumstances (like regression) it may not be reasonable to do anything other than look at the residuals as a pooled set (although binning and/or a visual inspection of the residual plot would be options). But there's a difference between making those kind of practical decisions and stating that normally-distributed pooled residuals necessarily mean the individual groups are also normally-distributed.

(I'd also note that the article you linked to says "If there really are many values of Y for each value of X (each group), and there really are only a few groups (say, four or fewer), go ahead and check normality separately for each group" which is the case for the penguin data set.)

I hope I'm not offending you by continuing this discussion! I'm certainly happy to consider a rebuttal!

Cheers, Alistair

Comment written by Antoine Soetewey on December 02, 2020 19:37:01:

Dear Alistair,

I always learn a lot by confronting different points of view so you are definitely not offending me, discussions are even encouraged!

After some research, according to me:

The inconsistency comes from the fact that the left skewed and the right skewed distributions cancel each other out in the residuals graph to give something Normally distributed.
This appears because "B is simply -A"
This implies that observations are not independent (they are not even independent at all since half of the sample is completely determined by the other half)
It is this violation of independence that allows to have the compensation

So for me, the inconsistency between normality by group and normality of residuals comes from a violation of the assumption of independence of observations. And unless I am mistaken, I do not think distributions can be:

left skewed and right skewed at the same time
such that they give normally distributed residuals,
while still being independent

(Also, as a side note, it is known that for sufficiently large random samples, the normality assumption can be relaxed. This is possible thanks to the central limit theorem, which says that for sufficiently large random samples (usually n >= 30), distribution of the sample means will be approximately normally distributed regardless of the underlying distributions.)

But I would be more than happy to hear what you think about that, so any thoughts on this matter?

Regards,
Antoine

AntoineSoetewey commented 3 years ago

Comment written by Cavan Bonner on December 05, 2020 03:16:48:

I really want to be able to calculate Cohen's d effect sizes from my Tukey's HSD post-hoc comparisons, but I have not been able to find code that calculates effect sizes directly from the HSD results. Am I missing something about the nature of the HSD results, or are effective sizes just typically not reported? It seems like they should be, since Tukey's HSD is comparing means.

AntoineSoetewey commented 3 years ago

Comment written by Cavan Bonner on December 05, 2020 03:16:48:

I really want to be able to calculate Cohen's d effect sizes from my Tukey's HSD post-hoc comparisons, but I have not been able to find code that calculates effect sizes directly from the HSD results. Am I missing something about the nature of the HSD results, or are effective sizes just typically not reported? It seems like they should be, since Tukey's HSD is comparing means.

Comment written by Antoine Soetewey on December 05, 2020 08:41:05:

Dear Cavan,

From what I know, effect sizes are usually not reported with Tukey HSD results.

See how to compute effect sizes from test statistics and effect sizes in ANOVA for more information.

You could also look for functions which compute Cohen(s effect sizes directly from your data. See for instance the cohensD() function from the {lsr} package.

Hope this helps.

Regards,
Antoine

AntoineSoetewey commented 3 years ago

Comment written by Anonymous on November 11, 2020 01:56:38:

In the post-hoc test the Adelie - Chinstrap p-value is more than the anova adjusted p-value. Does this mean that we cannot reject the null hypothesis? I am bit confused on why you mentioned the p-value of 0.05, and not the adjusted p-value? Thank you for your reply in advance.

AntoineSoetewey commented 3 years ago

Comment written by Anonymous on November 11, 2020 01:56:38:

In the post-hoc test the Adelie - Chinstrap p-value is more than the anova adjusted p-value. Does this mean that we cannot reject the null hypothesis? I am bit confused on why you mentioned the p-value of 0.05, and not the adjusted p-value? Thank you for your reply in advance.

Comment written by Antoine Soetewey on November 11, 2020 10:12:29:

Not sure what p-values you are referring to, but the p-values in the output of the Tukey HSD or Dunnett's test given by summary(post_test) are the adjusted p-values (and not the non-adjusted p-values).

Since those adjusted p-values are below our significance level of 0.05, we reject the null hypothesis.

I have made it clearer in the article that I use the adjusted p-values to decide on whether the null hypothesis is rejected or not.

Hope this helps.

Regards, Antoine

Niki-20 commented 3 years ago

Dear Antoine,

Thank you for the article. It is super well written and very easy to follow. It is the article that has given me the most help.

I have a question regarding the case you'd like to use HSD post - hoc test for uneven sample sizes between groups of treatment, after having done an ANOVA test. I have understood the best post - hoc test to perform in the case of uneven sample sizes is Tukey - Kramer test, which can be done in R by commanding "unbalanced=TRUE" on a HSD.test, using the package "agricolae". After having performed the Tukey - Kramer test, I'd like to plot the results for a better visualization of the significant differences between groups. If you plot the HSD.test, R gives you by default a plot with different colours by groups of letters (e.g. one colour each for "a", "ab", "abc", "bc", "c" groups). This latter point is somehow confusing, because the colors do not really reflect which groups are significantly different between them. The groups "a", "ab" and "abc", share the letter "a", and therefore, are not significantly different. But in the plot are shown with different coloration. How can I perform the visualization of the plot, for applying different colours between the groups that are indeed, significantly different between them?

Another question I found interesting to ask you is, in case your dataset does not follow a normal distribution and you have to perform Kruskall - Wallis test, which post - hoc test would you recommend to perform? I've heard Wilcoxon test is the best one. Also, in case we doubt of homogeneity in our dataset (unequal variances) and have to perform a Welsch test, what post -hoc test would you recommend? Games - Howell test?

Thank you in advance for reading this, and the response you'll give me! I appreciate it very much.

Nicole.

AntoineSoetewey commented 3 years ago

Dear Nicole,

Thank you for the article. It is super well written and very easy to follow. It is the article that has given me the most help.

Thanks for your feedback!

I have a question regarding the case you'd like to use HSD post - hoc test for uneven sample sizes between groups of treatment, after having done an ANOVA test. I have understood the best post - hoc test to perform in the case of uneven sample sizes is Tukey - Kramer test, which can be done in R by commanding "unbalanced=TRUE" on a HSD.test, using the package "agricolae".

In case of unequal sample sizes (but equal variances) you can indeed use the Tukey-Kramer test. See this thread or this article.

After having performed the Tukey - Kramer test, I'd like to plot the results for a better visualization of the significant differences between groups. If you plot the HSD.test, R gives you by default a plot with different colours by groups of letters (e.g. one colour each for "a", "ab", "abc", "bc", "c" groups). This latter point is somehow confusing, because the colors do not really reflect which groups are significantly different between them. The groups "a", "ab" and "abc", share the letter "a", and therefore, are not significantly different. But in the plot are shown with different coloration. How can I perform the visualization of the plot, for applying different colours between the groups that are indeed, significantly different between them?

I totally agree with you: I find the letters and coloration somewhat confusing. This is the reason that I prefer to display the p-values instead (see an example here). Unfortunately, with plot(HSD.test()) it does not seem to be possible to make colors different only for groups that are significantly different, unless you edit the code of the function yourself.

There is also the ggbetweenstats() function from the {ggstatsplot} package which allows you to display the p-values for all comparisons. However, unless I am mistaken, I don't see any possibility to perform a Tukey-Kramer test. So in conclusion I don't have any easy solution for you regarding this point.

Another question I found interesting to ask you is, in case your dataset does not follow a normal distribution and you have to perform Kruskall - Wallis test, which post - hoc test would you recommend to perform? I've heard Wilcoxon test is the best one.

I personally recommend performing a Dunn's test after a significant Kruskal-Wallis test. You could use the Wilcoxon test but you'll need to adjust the p-values to account for the issue of multiplicity. See more in this article.

Also, in case we doubt of homogeneity in our dataset (unequal variances) and have to perform a Welsch test, what post -hoc test would you recommend? Games - Howell test?

You're right, if you use a Welch ANOVA (due to unequal variances), you can use the Games-Howell test as post-hoc test (source).

Hope this helps.

Regards, Antoine

nina-nauwelaerts commented 3 years ago

Which non-parametric test can be used as an alternative to the Dunnett's test?

AntoineSoetewey commented 3 years ago

non-parametric test can be used as an alternative to the Dunnett's test?

Based on the paper by Steel (1959), I would use the Steel test. In his paper, Steel wrote (p. 561): "The proposed test is a non-parametric analogue of Dunnett's procedure".

Another option when there are a small number of comparisons is to manually perform Wilcoxon tests with an adjusted significance level alpha (if you use the Bonferroni correction, set alpha divided by the number of comparisons).

Hope this helps.

Regards, Antoine

jamorillo commented 3 years ago

Dear Antoine: Your blog is absolutely great, thanks for sharing all this knowledge!

I am a microbiologist with a basic level or R and statistics. I have a doubt about the interpretation of ANOVA that is “killing me”. You said in this post: ”only if the null hypothesis (ANOVA) is rejected” we should go for posthoc tests (If I understood correctly). This is what I always thought. But recently, I see that in many cases other colleagues are performing posthocs after a non-significant global test (ANOVA). They assume that a significant ANOVA is only obligatory for specific posthoc tests (like protected LSD Fisher), but not for many others. See for example (!):

https://stats.stackexchange.com/questions/9751/do-we-need-a-global-test-before-post-hoc-tests

So, might be the “trick” to consider specific contrasts as “planned comparisons” instead of “posthoc” tests? Then those comparisons would be performed ignoring the global ANOVA result. It is really shocking for me because I have the impression (taking into account that I am not a mathematician) different opinions among statisticians about this point. THANKS!

Jose PD: of course, sometimes we have situations where the global ANOVA do not reject the null hypothesis but specific contrats are statistically significant.

AntoineSoetewey commented 3 years ago

Dear Antoine: Your blog is absolutely great, thanks for sharing all this knowledge!

I am a microbiologist with a basic level or R and statistics. I have a doubt about the interpretation of ANOVA that is “killing me”. You said in this post: ”only if the null hypothesis (ANOVA) is rejected” we should go for posthoc tests (If I understood correctly). This is what I always thought. But recently, I see that in many cases other colleagues are performing posthocs after a non-significant global test (ANOVA). They assume that a significant ANOVA is only obligatory for specific posthoc tests (like protected LSD Fisher), but not for many others. See for example (!):

https://stats.stackexchange.com/questions/9751/do-we-need-a-global-test-before-post-hoc-tests

So, might be the “trick” to consider specific contrasts as “planned comparisons” instead of “posthoc” tests? Then those comparisons would be performed ignoring the global ANOVA result. It is really shocking for me because I have the impression (taking into account that I am not a mathematician) different opinions among statisticians about this point. THANKS!

Jose PD: of course, sometimes we have situations where the global ANOVA do not reject the null hypothesis but specific contrats are statistically significant.

Thanks for your feedback and your question Jose.

You are not the first one to come up with this question, and I understand that it is confusing (to be honest, for me too actually!).

I believe the last paragraph from the answer on StackExchange summarizes the discussion quite well:

ANOVA tests the overall null hypothesis that all the data come from groups that have identical means. If that is your experimental question -- does the data provide convincing evidence that the means are not all identical -- then ANOVA is exactly what you want. More often, your experimental questions are more focused and answered by multiple comparison tests (post tests). In these cases, you can safely ignore the overall ANOVA results and jump right to the post test results.

What I tend to do is that:

if my research question is to verify whether there is enough evidence that all means are not identical, I perform an ANOVA
if ANOVA results are non-significant, I stop at this stage. If ANOVA results are significant and I am curious to know which group(s) is(are) responsible for these results, I perform a post-hoc test
however, as you mention, if I am more interested in some specific comparisons between groups (but not really about the global picture), I tend to perform post-hoc tests directly

However, I would make sure to choose what I am going to do before seeing any results. It is, in my opinion, too easy to perform post-hoc tests because results of the ANOVA are not what was expected or desired. In French we have a saying: "Qui cherche trouve", translated literally, "Who seeks finds". What I mean is that if you perform many tests (and probably more than what you are supposed to do), at some point you will find a significant result.

Last but not least, in introductory classes I usually teach the most "prudent/conservative" process. And it seems to me that it is more prudent to perform post-hoc tests only after significant ANOVA results than giving the choice between (i) post-hoc tests only after a significant ANOVA and (ii) post-hoc tests directly. This because some people could be inclined to change their research question so that post-hoc tests are legitimate even with non-significant ANOVA results. And a small portion of these people will find statistically significant comparisons although the ANOVA is non-significant (due to higher power of some post-hoc tests compared to an ANOVA).

This is only my opinion (which can of course be wrong) and I would be happy to discuss it.

Hope this helps.

(For your information, I have added a footnote regarding this.)

Regards, Antoine

jamorillo commented 3 years ago

Hi Antoine, Thanks a lot for your detailed and fast response. Then is clear too me that this is not only about mathematics (what is "allowed" or not) but also about rational thinking, and a bit of expert decisions based on specific knoweledge of whatever your are studying. Very nteresting. I also found this paper, check Fig.7. They make a clear distinction between posthoc (a posteriori test) and planned comparisons (a priori tests) and the way to proceed in both cases.

https://peerj.com/articles/10387/

For those specific planned comparisons (a predifined limited number among all possible pairwise combinations) I am trying to use the emmeans package in R. But they clearly state that these can be "independent of ANOVA". Interesting. Cheers! Jose

AntoineSoetewey commented 3 years ago

Hi Antoine, Thanks a lot for your detailed and fast response. Then is clear too me that this is not only about mathematics (what is "allowed" or not) but also about rational thinking, and a bit of expert decisions based on specific knoweledge of whatever your are studying. Very nteresting.

Exactly, which makes statistics even more fascinating!

I also found this paper, check Fig.7. They make a clear distinction between posthoc (a posteriori test) and planned comparisons (a priori tests) and the way to proceed in both cases.

https://peerj.com/articles/10387/

For those specific planned comparisons (a predifined limited number among all possible pairwise combinations) I am trying to use the emmeans package in R. But they clearly state that these can be "independent of ANOVA". Interesting. Cheers! Jose

Thanks for the paper, very interesting!

Good luck in your analyses.

Regards, Antoine