ACCLAB / DABEST-python

Data Analysis with Bootstrapped ESTimation
https://acclab.github.io/DABEST-python/
Apache License 2.0
341 stars 47 forks source link

pairing mean_diff with Mann-Whitney test #92

Closed tmchartrand closed 4 years ago

tmchartrand commented 4 years ago

It seems to me that there's a conceptual issue with the default behavior shown in the tutorial: the documentation there states "By default, DABEST will report the two-sided p-value of the most conservative test that is appropriate for the effect size.", which in the example means that the mean_diff effect size reports a Mann-Whitney test p-value. However, the Mann-Whitney test is fundamentally unrelated to the mean difference in general, corresponding to the median difference instead. It's possible that a positive mean difference and negative median difference could both be valid conclusions for the right data! Maybe the defaults could restrict to pairing mean- and median-based effect sizes and tests?

tmchartrand commented 4 years ago

Actually now that I think about it, even median_diff doesn't necessarily match the Mann-Whitney conclusion. Median of differences would match MW (as does Cliff's delta), but median_diff is calculating difference of medians (not sure what corresponding test could be used there). I guess this sort of decision is always complex and should be left to the user, but the tutorial example should definitely be adjusted at least, in my opinion!

josesho commented 4 years ago

Hi @tmchartrand ,

You are correct in pointing out the incongruency. The overarching intention of estimation plots is to de-emphasise the dichotomous all-or-nothing nature of hypothesis tests, which current usage of P values exacerbates.

In the webapp at estimationstats.com, we do state below the results that

the P value(s) reported are the likelihood(s) of observing the effect size(s), if the null hypothesis of zero difference is true; they are included here to satisfy a common requirement of scientific journals.

which hopefully serves to inform the reader on what a P value really is...

We will update the tutorial to bring the intent and thinking in line with an estimation framework that de-emphasises P values; thanks for pointing this out!

tmchartrand commented 4 years ago

Thanks for the reply! I'm not sure if my meaning got across fully though. It's not so much the tension between p-values and estimation plots in general that I was trying to bring up. As I see it, they can provide complementary views of the same question, provided the effect size measures the same properties of the data as the test statistic does - this is the case when a mean difference effect size plot is paired with a t-test, but not when it is paired with a MW test.

josesho commented 4 years ago

Right, thanks for elaborating.

We print the Mann-Whitney by default because it widely used in biomedical literature as a non-parametric "version" or counterpart to the two-group t-test. You are right in pointing out the discrepancy between mean differences and the hypothesis tested by the Mann-Whitney. The other robust non-parametric test I am aware of is the Kolmogorov-Smirnov test, but in my anecdotal experience, it is uncommon to see it deployed for a two-group comparison.

In any case, a short (foot)note in the tutorial should suffice to correct this. 👍🏼

josesho commented 4 years ago

We are now using permutation tests as the default statistical test in place of Mann-Whitney. See PR #96; feel free to upgrade to v0.3.0