SPM and effect sizes (with example)

0todd0000 commented 8 years ago

(Below is a summary of correspondence regarding effect size that recently took place elsewhere, please feel free to comment. Thank you for raising this issue Clint!)

Q. In preparing data for discussion with clinicians the "absolute values" were presented, followed by the SPM statistics and then the effect sizes (basically (mean1-mean2)/mean(std1 and std2)), as depicted in the figure below. Is this approach acceptable?

A. For conceptualizing experimental results effect sizes are fantastic. In simple experiments effect sizes are more or less the same thing as the test statistic, as is apparent in the attached figure. Compared to effect sizes the main benefits of using test statistics are: (1) they are useful for arbitrarily complex experiments involving arbitrarily complex dependent variables, and (2) they are more closely associated with probability, so are generally more useful for making statistical inferences. I'd say: try using both effect sizes and test statistics in a number of different situations and studies, and decide which is most valuable for a particular application.

Todd

gastroc

PatricioUQ commented 3 years ago

Hi Todd,

I would like retake the conversation in issues #48 and #56, regarding how to report effect sizes in SPM results. I have a non-trivial design (2D statistics), a Two-way ANOVA with repeated measures for both factors, with paired T-tests as post hocs. One of the reviewers requested to include in the results the partial eta squared effect statistic, so I assume she/he wants to see effect sizes. After reading the conversation in the issues #48 and #54, I understand that T statistics can be used as a way to show the effect size. Can this be used with my F and T 2D statistical maps? I'm naive in statistical philosophy, so this use of the T/F values is new for me, and I would not feel comfortable defending this to anyone.

Thank you for your help,

Patricio

0todd0000 commented 3 years ago

Hello Patricio,

Yes, you can use test statistic values, including F values, as approximations of effect sizes.

Note that eta-squared is (SS_effect / SS_total), and that F is (MS_effect / MS_total). These are practically the same thing. The only difference is that the MS values are scaled by sample size, as is required for probabilistic inference.

Key theoretical problems with effect sizes include:

They have no direct probabilistic meaning. Larger effect sizes are expected by chance as the sample size decreases. In contrast, test statistics (like t and F values), have direct probabilistic meaning. This is why test statistics --- and not effect sizes --- are used for probability calculations.
Effect sizes also have little post hoc meaning. Effect sizes are meant primarily for a priori analyses, and usually for power analysis and sample size estimates. A variety of statistical publications regard post hoc effect size analysis as statistically meaningless, and even misleading.

So I'd recommend against explicitly reporting effect sizes unless they are an explicit part of a priori analysis.

To aid with your response to reviewers, please see the figure below, along with the script I used to generate it. This script is a hack of ./spm1d/examples/stats1d/anova1.py. Note that the F statistic and eta-squared values are qualitatively identical.

Todd

fig

import warnings
import numpy as np
from matplotlib import pyplot as plt
import spm1d

def aov(model, contrasts, f_terms, nFactors=1):
    '''
    This is a hacked change to spm1d.stats.anova.ui.aov that also returns SS and MS terms
    '''
    _spm     = spm1d.stats._spm
    _spmlist = spm1d.stats._spmlist

    effects = np.asarray( np.dot(model.QT, model.Y) )
    if model.dim==0:
        effects = effects.flatten()
    SS      = np.dot(contrasts.C, effects**2)
    DF      = np.asarray(contrasts.C.sum(axis=1), dtype=int)
    F       = []
    SSout   = []
    MSout   = []
    for term0,term1 in f_terms:
        i       = contrasts.term_labels.index(term0)
        ss0,df0 = SS[i], DF[i]

        ms0     = ss0 / df0
        if term1 == 'Error':
            ss1,df1,ms1  = model._SSE, model._dfE, model._MSE
        else:
            i       = contrasts.term_labels.index(term1)
            ss1,df1 = SS[i], DF[i]
            ms1     = ss1 / df1
        f           = ms0 / ms1
        if model.dim == 0:
            F.append( _spm.SPM0D_F(f, (df0,df1), (ss0,ss1), (ms0,ms1), model.eij, model.QT) )
        else:
            if model.roi is not None:
                f   = np.ma.masked_array(f, np.logical_not(model.roi))
            F.append( _spm.SPM_F(f, (df0,df1), model.fwhm, model.resels, model.X, model._beta, model.eij, model.QT, roi=model.roi) )

        SSout.append(ss0)
        MSout.append(ms0)

    SSout += [model._SSE]
    MSout += [model._MSE]
    return _spmlist.SPMFList( F, nFactors=nFactors ), np.asarray(SSout), np.asarray(MSout)

def anova1(Y, A=None, equal_var=False, roi=None):
    '''
    This is a hacked change to spm1d.stats.anova.ui.anova1 that also returns SS and MS terms
    '''
    _datachecks = spm1d.stats._datachecks
    _reml       = spm1d.stats._reml
    designs     = spm1d.stats.anova.designs
    models      = spm1d.stats.anova.models

    if isinstance(Y, (list,tuple)):
        _datachecks.check('anova1list', Y)
        A   = np.hstack([[i]*y.shape[0] for i,y in enumerate(Y)])
        Y   = np.hstack(Y) if Y[0].ndim==1 else np.vstack(Y)
    else:
        _datachecks.check('anova1', Y, A)
    design  = designs.ANOVA1(A)
    model   = models.LinearModel(Y, design.X, roi=roi)
    model.fit()
    F,SS,MS   = aov(model, design.contrasts, design.f_terms, nFactors=1)
    F         = F[0]
    if not equal_var:
        warnings.warn('\nWARNING:  Non-sphericity corrections for one-way ANOVA are currently approximate and have not been verified.\n', UserWarning, stacklevel=2)
        Y,X,r = model.Y, model.X, model.eij
        Q,C   = design.A.get_Q(), design.contrasts.C.T
        F.df  = _reml.estimate_df_anova1(Y, X, r, Q, C)
    return F,SS,MS

#(0) Load dataset:
dataset      = spm1d.data.uv1d.anova1.SpeedGRFcategorical()
# dataset      = spm1d.data.uv1d.anova1.Weather()
Y,A          = dataset.get_data()

#(1) Run ANOVA:
alpha        = 0.0005
F,SS,MS      = anova1(Y, A, equal_var=False)
Fi           = F.inference(alpha, interp=True, circular=False)
print( Fi )

#(2) Plot results:
plt.close('all')
fig,ax = plt.subplots(1, 2, figsize=(10,4))
Fi.plot(ax=ax[0])

etasq = SS[0] / SS[1]
ax[1].plot( etasq )
ax[1].set_ylabel(r'$\eta^2$', size=16)
plt.show()

PatricioUQ commented 3 years ago

Thanks Todd for your thoughtful answer.

I just want to quote one of your previous answers:

“As an aside, I fully appreciate the problem of how to describe the results. My main advice, based on the Neuroimaging literature, is that SPM results transcend the convention of text descriptions to a certain extent because the graphs depict results that text descriptions can never fully describe. In the Neuroimaging literature effect sizes and most other statistics are reported at the continuum level, in that case as a 3D continuum within the brain, and these are described in the text qualitatively, often augmented with cluster-specific p values.”

What do you mean with the phrase in bold? Can you give me a hypothetical example?

Cheers

Patricio

0todd0000 commented 3 years ago

This means that the SPM results themselves are the quantitative results, and that usually researchers describe these results qualitatively in a manuscript's text.

As an example consider the SPM{F} results above. One could attempt to describe the SPM{F} quantitatively (e.g. the maximum F value, the proportion of time that the F value exceeds the critical threshold, etc.). However, the SPM{F} results speak for themselves. It is more valuable, in my opinion, to use the text to describe the these results qualitatively (e.g., the F statistic exceeded the threshold over most of stance phase), and to potentially augment those qualitative descriptions with p values (i.e., cluster-level p values).

PatricioUQ commented 3 years ago

Thank you Todd, that's really clarifying.

I would like to leave this reference here to anyone with a similar question, it may be helpful in the topic of effect sizes and 2D data

https://jamanetwork.com/journals/jamapsychiatry/fullarticle/2597706

Cheers,

Patricio

0todd0000 / spm1d

SPM and effect sizes (with example) #54