p-value and Cohen's d regarding independent sample t-tests

0todd0000 / spm1d

One-Dimensional Statistical Parametric Mapping in Python

GNU General Public License v3.0

61 stars 21 forks source link

Hi, I have 4 questions regarding the independent samples t-test in SPM1d statistical analysis：

After I conducted spm statistical analysis in matlab, the result shows that there is a significant difference in a data cluster, but there is no start frame and end frame as in the literature [Ankle kinematics, center of pressure progression, and lower extremity muscle activity during a side-cutting task in participants with and without chronic ankle instability], so my question is: if I want to have the results of the calculation of the start frame and end frame of the region with significant difference, how can I implement it in matlab?
The above literature reports the d-value of the spm analysis results, but it is a numerical value, not in the form of a curve, as you described in the forum #54 #76, so did he report the d-value improperly?
After browsing the #76 discussion, would this curve of cohen’s d be a result of calculating the cohen's d value for each corresponding point, such as the 34th point on the cohen's d curve, from the mean, standard deviation and pooled std of the 34th data for all subjects? Moreover, how to calculate the cohen's d for independent sample t-test using matlab?
I understood what you mentioned in discussion #54, suggesting not to report d-values. Because the d-values in the spm analysis are different from the d-values we used to get from statistical analysis of discrete data, and the d-values obtained from the spm analysis may be larger than the actual ones, so reporting the d-values is not that meaningful. But if the reviewers insist that we should report d-values, then my concern is that it seems a bit strange to report d-values in the form of a curve plot? Since I don't have a deep enough understanding of SPM analysis approach, I might be a bit wordy on these questions and would appreciate your answers. Please let me know if I have not made it clear enough, thanks!

Thank you for these questions!

The endpoints for the first cluster (from left-to-right) can be retrieved in Matlab as follows:

spmi      = spm1d.stats.ttest(y).inference(0.05);
endpoints = spmi.clusters{1}.endpoints;

Since the result is reported as d = 1.03 - 1.05, I think this means that the d-value ranges between 1.03 and 1.05 within this region.

Yes. Just like means and SD values can be calculated at an arbitrary point along the 1D domain, so too can functions of the mean and SD, like the t-value and like Cohen's d. Cohen's d for an independent samples test (i.e., a two-sample test) can be calculated using the code in the cited issue: #76. A MATLAB translation is:

Q  = 101; %number of continuum nodes
JA = 8;   %sample size, Group A
JB = 8;   %sample size, Group B
yA = randn( JA, Q );  %random sample, GroupA
yB = randn( JB, Q );  %random sample, GroupB

mA = mean( yA, 1 ); %mean, Group A
mB = mean( yB, 1 ); %mean, Group B
sA = std( yA, [], 1 ); %st.dev., Group A
sB = std( yB, [], 1 ); %st.dev., Group B

s  = sqrt(   ( (JA-1)*sA.^2 + (JB-1)*sB.^2 ) / (JA+JB)   );  %pooled st.dev.

d  = ( mA - mB ) ./ s;  %Cohen's d

I think it's fine to report d-values, provided you do not use standard interpretations. Consider a threshold d-value like d=1.20, which is often interpreted as indicative of a "very large" effect. The problem is that --- when there is no true effect --- a random 1D process is much more likely to produce a d-value of 1.20 than a random 0D process. So the d-value itself is fine. It's only the interpretation that is potentially problematic. So I suggest reporting the d-values if requested, and then citing the literature (for example: this paper) to show that standard, 0D interpretations of various statistical quantities like t-values and d-values cannot be applied directly to 1D processes.

0todd0000 / spm1d

p-value and Cohen's d regarding independent sample t-tests #197