AnaFVicente commented 5 years ago

I used you the data you created on https://acclab.github.io/DABEST-python-docs/tutorial.html I make a t-test plot. If I choose median difference, the histogram corresponding to 95% CI (grey histogram) is not aligned with the interval (black line)

That's the code: from scipy.stats import norm # Used in generation of populations.

np.random.seed(9999) # Fix the seed so the results are replicable. Ns = 20 # The number of samples taken from each population

Create samples

c1 = norm.rvs(loc=3, scale=0.4, size=Ns) c2 = norm.rvs(loc=3.5, scale=0.75, size=Ns) c3 = norm.rvs(loc=3.25, scale=0.4, size=Ns)

t1 = norm.rvs(loc=3.5, scale=0.5, size=Ns) t2 = norm.rvs(loc=2.5, scale=0.6, size=Ns) t3 = norm.rvs(loc=3, scale=0.75, size=Ns) t4 = norm.rvs(loc=3.5, scale=0.75, size=Ns) t5 = norm.rvs(loc=3.25, scale=0.4, size=Ns) t6 = norm.rvs(loc=3.25, scale=0.4, size=Ns)

Add a `gender` column for coloring the data.

females = np.repeat('Female', Ns/2).tolist() males = np.repeat('Male', Ns/2).tolist() gender = females + males

Add an `id` column for paired data plotting.

id_col = pd.Series(range(1, Ns+1))

Combine samples and gender into a DataFrame.

df = pd.DataFrame({'Control 1' : c1, 'Test 1' : t1, 'Control 2' : c2, 'Test 2' : t2, 'Control 3' : c3, 'Test 3' : t3, 'Test 4' : t4, 'Test 5' : t5, 'Test 6' : t6, 'Gender' : gender, 'ID' : id_col })

two_groups_paired = dabest.load(df, idx=("Test 6", "Test 5"), paired=True, id_col="ID")

plt.figure() two_groups_paired.median_diff.plot() plt.show()

AnaFVicente commented 5 years ago

Thanks in advance,

Ana

josesho commented 5 years ago

Hi @AnaFVicente , thanks for flagging this up!

I think this is a case of the commutative property of means, vs. the inherent weirdness of medians?

Using t5 and t6 from above:

>>> np.mean(t5) - np.mean(t6)

-0.023344683345558614

>>> np.mean(t5 - t6)

-0.023344683345558614 # Same result as above.

but

>>> np.median(t5) - np.median(t6)

-0.22693218492666745

>>> np.median(t5 - t6)

0.0528625540482075 # Not the same result...

Right now, I think the best option is for us to remove the mean lines for Gardner-Altman paired median plots....

Again, thanks for bringing this to our attention.

AnaFVicente commented 5 years ago

Thanks for your response. If I understood well, the median of the differences (black line) is calculated with a different method than the distribution of the differences (grey histogram). That's why there's a shift between both representations. Is there a way to calculate both parameters by using the same method: np.median(t5) - np.median(t6) or np.median(t5 - t6), so I get nice plots? Ana

josesho commented 5 years ago

The problem isn't that they are calculated in a different way. It seems to be much deeper than that. The paired median difference of t5 and t6 is positive, even though the median of t6 is lower than the median of t5....

After thinking about it for a while, there might not be a good way to depict paired median difference with the Gardner-Altman estimation plot. You might have to use the Cumming estimation plot to do so.

Simply use

 two_groups_paired.median_diff.plot(float_contrast=False)

to plot the sampling error histogram below the paired slopegraph.

AnaFVicente commented 5 years ago

Thanks for your reply. However, if I plot the cumming estimation I still have the same problem: the histogram corresponding to 95% CI distibrution (grey histogram) is not aligned with the interval (black line)

AnaFVicente commented 5 years ago

Would you suggest just to remove 95% grey histogram, corresponding to CI distibrution ?

josesho commented 5 years ago

The error curve actually is aligned with the 95CI; bootstraps derived from medians often have non-normal distributions. If you find the error curve distracting, you could remove it in a vector graphics program, but I'd advise including it as it highlights:

the non-normality of the median difference
the graded nature of the confidence interval.

Hope this helps!

josesho commented 5 years ago

Also, we are looking into how to properly compute and display paired median differences, taking into account all we have discussed above. Thanks for flagging this up to us!

AnaFVicente commented 5 years ago

Thanks a lot for your help!

AnaFVicente commented 5 years ago

Just a last question. I don't understand how the curve can be correctly aligned if it represents the median differences distribution while the black line represents 95%CI. Most of the curve should be inside the black line, only 5% of the date could be outside. If I understand correctly. Thanks in advance.

ACCLAB / DABEST-python

Paired t-test plot 95% CI shifted #46

Create samples

Add a `gender` column for coloring the data.

Add an `id` column for paired data plotting.

Combine samples and gender into a DataFrame.

ACCLAB / DABEST-python

Paired t-test plot 95% CI shifted #46

Create samples

Add a gender column for coloring the data.

Add an id column for paired data plotting.

Combine samples and gender into a DataFrame.

Add a `gender` column for coloring the data.

Add an `id` column for paired data plotting.