SPM's poor localizing power

The difference may be meaningful, but that meaning is irrelevant to both the null hypothesis and the meaning of the statistical results.

To understand why, it is important to understand (1) the null hypothesis, and (2) the fact that hypothesis tests test ONLY the null hypothesis.

In the case of a two-sample t-test, the null hypothesis is equivalent group population means. This hypothesis is illustrated in the figure below. For each of the 20 illustrated cases:

The two groups are indicated by color
Thin lines depict each of 8 observations per group
Thick lines depict group means
The true group population means are equivalent; both are the null trajectory
There is smooth, Gaussian noise around the means, approximately as smooth as is seen in biomechanics datasets.
Due to random sampling, the sample means are not equivalent to the population means, so the sample-means are non-zero

fig

These 20 cases depict how random sampling can produce various, and in some cases relatively large effects when the null hypothesis is true. For example: the effect in Case 15 at around time=50% is quite large, but a relatively large effect in the opposite direction appears in Case 13 at time=0%.

Should you focus on time=50% for Case 15? No, because there is in fact no effect, by definition. I've controlled these datasets to have precisely equivalent population means, so there is no population effect. The apparent effects that emerge are simply artifacts of random sampling.

However, this null hypothesis perspective presumes that there is indeed no effect. Your question rightly points out that real population effects do exist. If one instead presumes that a true population effect exists, then large observed effects may indeed be reflective of true population effects.

So which perspective is correct? The null hypothesis perspective, where no population differences actually exist, or the alternative perspective, where a true population difference actually exists? Both are correct, because both can be true, but only the null hypothesis perspective is consistent with hypothesis testing. These two perspectives are generally referred to as predictive and exploratory perspectives, respectively. The exploratory perspective by far dominates the biomechanics literature. It presumes that there is an effect, and often uses hypothesis testing to try to discover effects. While this is indeed a valid use of hypothesis testing, it is important to realize that rejecting a null hypothesis does not directly imply that a real effect exists. As illustrated in the figure above, random sampling itself can produce large effects.

Hypothesis testing's primary goal is to protect us from concluding that an effect exists when in fact there is no population effect (i.e., Type I error). SPM achieves this by setting an effect threshold (e.g. a critical t-value), so that we make Type I errors with a frequency of just $\alpha$. By setting $\alpha=0.05$ we will, by definition, make Type I errors with a frequency of 5%, or in 1 of 20 experiments on average. Of the the 20 cases depicted in the figure above, we may indeed conclude "significance" in Case 15, but this would be a Type I error because there is no true population effect.

SPM does not care where the artifact effect is, it protects against artifact effects (Type I errors) only at the whole-trajectory level. This is why SPM has poor localizing power: it is not protecting against Type I error at any specific point, it is only protecting across the whole time domain.

Consequently, SPM results do not pertain directly to real effects when the null hypothesis is "no effect", Instead they pertain directly ONLY to the case where the null hypothesis of "no effect" is true. In order to achieve greater localizing power (e.g. regarding effects at time=50%) one must limit the scope of the null hypothesis to a more local one (e.g. time=40-60%). In all hypothesis testing procedures (not just SPM) there is a universal tradeoff between hypothesis scope (e.g. temporal extent) and power (e.g. temporal localizability) The only way to improve localizability is to reduce the scope.

To answer your two questions:

Q. Isn't that difference specifically meaningful?

A. It is not meaningful from the perspective of the null hypothesis.

Q. Shouldn't I focus more attention on that region to understand why there’s a difference there?

A. Possibly, but this presumes that you have discovered a true population effect. It may indeed be a true population effect, but it may also be an artifact of random sampling. Since one can never know whether it is a true effect or not, interpretive caution is advised. The only way to become more certain that it is a true effect is to conduct additional experiments using new random samples, and preferably narrowing the scope of the null hypothesis to those areas where one hypothesizes a true effect exists.

0todd0000 / spm1d

SPM's poor localizing power #225