0todd0000 / spm1d

One-Dimensional Statistical Parametric Mapping in Python
GNU General Public License v3.0
61 stars 21 forks source link

SPM's poor localizing power #225

Closed 0todd0000 closed 1 year ago

0todd0000 commented 1 year ago

(This is paraphrased from an email conversation.)

The following is an excerpt from Honert & Pataky (2021) (Discussion, Paragraph 3):

The analysis performed here illustrates how time-continuous analyses have poor localizing power. For SPM, this means that the significance threshold applies to the entire trajectory (see Fig. 5), and it is irrelevant where the threshold is crossed. For example, if an analysis shows that the threshold is crossed at 50% of the gait cycle (such as in Fig. 5, left), it does not imply that there is a true effect at 50%. Instead, it implies only that completely random trajectories can generate a similar result 5% (i.e. the α of the SPM test) of the time.

If a big difference is found between two groups at a certain point in the gait cycle, isn't that difference specifically meaningful, and shouldn't I focus more attention on that region to understand why there’s a difference there? The word choice of "irrelevant" in the excerpt above makes me think that my understanding is incorrect, but I’m having a hard time conceptualizing why.

0todd0000 commented 1 year ago

The difference may be meaningful, but that meaning is irrelevant to both the null hypothesis and the meaning of the statistical results.

To understand why, it is important to understand (1) the null hypothesis, and (2) the fact that hypothesis tests test ONLY the null hypothesis.

In the case of a two-sample t-test, the null hypothesis is equivalent group population means. This hypothesis is illustrated in the figure below. For each of the 20 illustrated cases:

fig

These 20 cases depict how random sampling can produce various, and in some cases relatively large effects when the null hypothesis is true. For example: the effect in Case 15 at around time=50% is quite large, but a relatively large effect in the opposite direction appears in Case 13 at time=0%.

Should you focus on time=50% for Case 15? No, because there is in fact no effect, by definition. I've controlled these datasets to have precisely equivalent population means, so there is no population effect. The apparent effects that emerge are simply artifacts of random sampling.

However, this null hypothesis perspective presumes that there is indeed no effect. Your question rightly points out that real population effects do exist. If one instead presumes that a true population effect exists, then large observed effects may indeed be reflective of true population effects.

So which perspective is correct? The null hypothesis perspective, where no population differences actually exist, or the alternative perspective, where a true population difference actually exists? Both are correct, because both can be true, but only the null hypothesis perspective is consistent with hypothesis testing. These two perspectives are generally referred to as predictive and exploratory perspectives, respectively. The exploratory perspective by far dominates the biomechanics literature. It presumes that there is an effect, and often uses hypothesis testing to try to discover effects. While this is indeed a valid use of hypothesis testing, it is important to realize that rejecting a null hypothesis does not directly imply that a real effect exists. As illustrated in the figure above, random sampling itself can produce large effects.

Hypothesis testing's primary goal is to protect us from concluding that an effect exists when in fact there is no population effect (i.e., Type I error). SPM achieves this by setting an effect threshold (e.g. a critical t-value), so that we make Type I errors with a frequency of just $\alpha$. By setting $\alpha=0.05$ we will, by definition, make Type I errors with a frequency of 5%, or in 1 of 20 experiments on average. Of the the 20 cases depicted in the figure above, we may indeed conclude "significance" in Case 15, but this would be a Type I error because there is no true population effect.

SPM does not care where the artifact effect is, it protects against artifact effects (Type I errors) only at the whole-trajectory level. This is why SPM has poor localizing power: it is not protecting against Type I error at any specific point, it is only protecting across the whole time domain.

Consequently, SPM results do not pertain directly to real effects when the null hypothesis is "no effect", Instead they pertain directly ONLY to the case where the null hypothesis of "no effect" is true. In order to achieve greater localizing power (e.g. regarding effects at time=50%) one must limit the scope of the null hypothesis to a more local one (e.g. time=40-60%). In all hypothesis testing procedures (not just SPM) there is a universal tradeoff between hypothesis scope (e.g. temporal extent) and power (e.g. temporal localizability) The only way to improve localizability is to reduce the scope.



To answer your two questions:

Q. Isn't that difference specifically meaningful?

A. It is not meaningful from the perspective of the null hypothesis.


Q. Shouldn't I focus more attention on that region to understand why there’s a difference there?

A. Possibly, but this presumes that you have discovered a true population effect. It may indeed be a true population effect, but it may also be an artifact of random sampling. Since one can never know whether it is a true effect or not, interpretive caution is advised. The only way to become more certain that it is a true effect is to conduct additional experiments using new random samples, and preferably narrowing the scope of the null hypothesis to those areas where one hypothesizes a true effect exists.