Closed AnaFVicente closed 5 years ago
Thanks in advance,
Ana
Hi @AnaFVicente , thanks for flagging this up!
I think this is a case of the commutative property of means, vs. the inherent weirdness of medians?
Using t5
and t6
from above:
>>> np.mean(t5) - np.mean(t6)
-0.023344683345558614
>>> np.mean(t5 - t6)
-0.023344683345558614 # Same result as above.
but
>>> np.median(t5) - np.median(t6)
-0.22693218492666745
>>> np.median(t5 - t6)
0.0528625540482075 # Not the same result...
Right now, I think the best option is for us to remove the mean lines for Gardner-Altman paired median plots....
Again, thanks for bringing this to our attention.
Thanks for your response. If I understood well, the median of the differences (black line) is calculated with a different method than the distribution of the differences (grey histogram). That's why there's a shift between both representations. Is there a way to calculate both parameters by using the same method: np.median(t5) - np.median(t6) or np.median(t5 - t6), so I get nice plots? Ana
The problem isn't that they are calculated in a different way. It seems to be much deeper than that. The paired median difference of t5
and t6
is positive, even though the median of t6
is lower than the median of t5
....
After thinking about it for a while, there might not be a good way to depict paired median difference with the Gardner-Altman estimation plot. You might have to use the Cumming estimation plot to do so.
Simply use
two_groups_paired.median_diff.plot(float_contrast=False)
to plot the sampling error histogram below the paired slopegraph.
Thanks for your reply. However, if I plot the cumming estimation I still have the same problem: the histogram corresponding to 95% CI distibrution (grey histogram) is not aligned with the interval (black line)
Would you suggest just to remove 95% grey histogram, corresponding to CI distibrution ?
The error curve actually is aligned with the 95CI; bootstraps derived from medians often have non-normal distributions. If you find the error curve distracting, you could remove it in a vector graphics program, but I'd advise including it as it highlights:
Hope this helps!
Also, we are looking into how to properly compute and display paired median differences, taking into account all we have discussed above. Thanks for flagging this up to us!
Thanks a lot for your help!
Just a last question. I don't understand how the curve can be correctly aligned if it represents the median differences distribution while the black line represents 95%CI. Most of the curve should be inside the black line, only 5% of the date could be outside. If I understand correctly. Thanks in advance.
I used you the data you created on https://acclab.github.io/DABEST-python-docs/tutorial.html I make a t-test plot. If I choose median difference, the histogram corresponding to 95% CI (grey histogram) is not aligned with the interval (black line)
That's the code: from scipy.stats import norm # Used in generation of populations.
np.random.seed(9999) # Fix the seed so the results are replicable. Ns = 20 # The number of samples taken from each population
Create samples
c1 = norm.rvs(loc=3, scale=0.4, size=Ns) c2 = norm.rvs(loc=3.5, scale=0.75, size=Ns) c3 = norm.rvs(loc=3.25, scale=0.4, size=Ns)
t1 = norm.rvs(loc=3.5, scale=0.5, size=Ns) t2 = norm.rvs(loc=2.5, scale=0.6, size=Ns) t3 = norm.rvs(loc=3, scale=0.75, size=Ns) t4 = norm.rvs(loc=3.5, scale=0.75, size=Ns) t5 = norm.rvs(loc=3.25, scale=0.4, size=Ns) t6 = norm.rvs(loc=3.25, scale=0.4, size=Ns)
Add a
gender
column for coloring the data.females = np.repeat('Female', Ns/2).tolist() males = np.repeat('Male', Ns/2).tolist() gender = females + males
Add an
id
column for paired data plotting.id_col = pd.Series(range(1, Ns+1))
Combine samples and gender into a DataFrame.
df = pd.DataFrame({'Control 1' : c1, 'Test 1' : t1, 'Control 2' : c2, 'Test 2' : t2, 'Control 3' : c3, 'Test 3' : t3, 'Test 4' : t4, 'Test 5' : t5, 'Test 6' : t6, 'Gender' : gender, 'ID' : id_col })
two_groups_paired = dabest.load(df, idx=("Test 6", "Test 5"), paired=True, id_col="ID")
plt.figure() two_groups_paired.median_diff.plot() plt.show()