Closed tmchartrand closed 4 years ago
Actually now that I think about it, even median_diff doesn't necessarily match the Mann-Whitney conclusion. Median of differences would match MW (as does Cliff's delta), but median_diff is calculating difference of medians (not sure what corresponding test could be used there). I guess this sort of decision is always complex and should be left to the user, but the tutorial example should definitely be adjusted at least, in my opinion!
Hi @tmchartrand ,
You are correct in pointing out the incongruency. The overarching intention of estimation plots is to de-emphasise the dichotomous all-or-nothing nature of hypothesis tests, which current usage of P values exacerbates.
In the webapp at estimationstats.com, we do state below the results that
the P value(s) reported are the likelihood(s) of observing the effect size(s), if the null hypothesis of zero difference is true; they are included here to satisfy a common requirement of scientific journals.
which hopefully serves to inform the reader on what a P value really is...
We will update the tutorial to bring the intent and thinking in line with an estimation framework that de-emphasises P values; thanks for pointing this out!
Thanks for the reply! I'm not sure if my meaning got across fully though. It's not so much the tension between p-values and estimation plots in general that I was trying to bring up. As I see it, they can provide complementary views of the same question, provided the effect size measures the same properties of the data as the test statistic does - this is the case when a mean difference effect size plot is paired with a t-test, but not when it is paired with a MW test.
Right, thanks for elaborating.
We print the Mann-Whitney by default because it widely used in biomedical literature as a non-parametric "version" or counterpart to the two-group t-test. You are right in pointing out the discrepancy between mean differences and the hypothesis tested by the Mann-Whitney. The other robust non-parametric test I am aware of is the Kolmogorov-Smirnov test, but in my anecdotal experience, it is uncommon to see it deployed for a two-group comparison.
In any case, a short (foot)note in the tutorial should suffice to correct this. 👍🏼
We are now using permutation tests as the default statistical test in place of Mann-Whitney. See PR #96; feel free to upgrade to v0.3.0
It seems to me that there's a conceptual issue with the default behavior shown in the tutorial: the documentation there states "By default, DABEST will report the two-sided p-value of the most conservative test that is appropriate for the effect size.", which in the example means that the mean_diff effect size reports a Mann-Whitney test p-value. However, the Mann-Whitney test is fundamentally unrelated to the mean difference in general, corresponding to the median difference instead. It's possible that a positive mean difference and negative median difference could both be valid conclusions for the right data! Maybe the defaults could restrict to pairing mean- and median-based effect sizes and tests?