0todd0000 / spm1d

One-Dimensional Statistical Parametric Mapping in Python
GNU General Public License v3.0
61 stars 21 forks source link

Anova1rm - number of observations #231

Closed JonSJ1849 closed 1 year ago

JonSJ1849 commented 2 years ago

Dear Todd (cc. everyone interested)

I have a few questions pertaining to the number of observations included in a one-way repeated measures ANOVA using SPM.

In cases of only one observation per subject per condition, the one-way repeated measures ANOVA provides the warning message that your residuals are null, and that data smoothness therefor will be estimated from pseudo-residuals. If you fill in several observations, the SPM one-way ANOVA will not provide this warning.

Firstly, what is the effective consequence of using these pseudo-residuals? How are these pseudo-residuals calculated?

Secondly, how does the SPM analysis handle the additional observations per subject per condition? As I understand, it is different from averaging your trials together, but if you have several observations for each subject, does that somehow smooth your data, so that your FWHM will be higher than each individual observation?

Thirdly, If you provide additional observations to each subject on each condition, this will boost your statistical power, but these additional observations are not independent from each other, so outside the context of SPM, it has been suggested that this provides a false elevation of your statistical power, but I am wondering if the SPM analysis somehow accounts for this issue?

0todd0000 commented 1 year ago

Very relevant and not-so straightforward issues! I'll respond below...



How are these pseudo-residuals calculated?

When there is only one observation per ANOVA cell there are effectively no residuals. spm1d calculates pseudo-residuals within conditions (e.g. across subjects) in a design-dependent manner.



what is the effective consequence of using these pseudo-residuals?

It is a bit easier to answer a slightly different question: "Does using multiple observations have consequences"? The answer to this question is "no", unless (a) the data are grossly non-normal, and/or (b) the smoothness / frequency content is highly variable across observations. Regarding (a): if the data are grossly non-normal, then parametric methods should probably not be used, which renders the (b) problem moot because nonparametric procedures can handle cases of inconsistent smoothness. Regarding (b) itself: this could potentially pose some numerical instabilities, but mitigating this potential problem are two factors: (i) most real datasets have rather consistent smoothness characteristics, (ii) any numerical variability introduced is unlikely to greatly affect critical thresholds, so expected consequences are negligible.

To return to your question: using pseudo-residuals affects only smoothness estimates, which pertains only to the potential (b) problem above. As (b) is a considerably less likely problem than (a), I expect that there are negligible consequences of using pseudo-residuals in routine analyses.



Secondly, how does the SPM analysis handle the additional observations per subject per condition?

SPM uses linear models so additional observations are handled exactly as they are handled in standard RM-ANOVA: they are effectively irrelevant to ANOVA ratios. However, note that spm1d currently implements only simple ANOVA models, where there is no explicit modeling of the variability associated with the extra variables. Provided the residuals are normally distributed, this approach is effectively equivalent to a full model that contains all variance terms.



As I understand, it is different from averaging your trials together, but if you have several observations for each subject, does that somehow smooth your data, so that your FWHM will be higher than each individual observation?

It is the same as averaging trials together. Data are not smoothed by adding more observations, instead the population smoothness estimate may get slightly better. This is analogous to improved population mean and variance estimates as sample size gets larger.



Thirdly, If you provide additional observations to each subject on each condition, this will boost your statistical power, but these additional observations are not independent from each other, so outside the context of SPM, it has been suggested that this provides a false elevation of your statistical power, but I am wondering if the SPM analysis somehow accounts for this issue?

This is incorrect. Adding more observations does not increase power because the modeled RM-ANOVA effects pertain to between-condition and between-group variance and not to between-observation variance. Identically, the degrees of freedom of the ANOVA model are unaffected by the inclusion of additional observations. The only way to increase power is to add more subjects.