The effects of data alignment on SPM results

Hello Todd,

I just read your latest paper "Timing of gait events affects whole trajectory analyses: A statistical parametric mapping sensitivity analysis of lower limb biomechanics" and have a couple of follow-up questions. Very interesting read!

First, if I understand correctly, the take home message of the paper is that because of differences in timing and/or magnitude it is not possible to use spm to determine which localized portion of the gait cycle are different between groups/conditions but rather it can be used to determine if there are differences between the "whole trajectories" between groups/conditions. Is that a correct interpretation?

Second, in the article there was some mention of curve registration or warping as a way to mitigate the effects of timing variability. I'm curious if a reasonable approach would be to follow-up an initial analysis of non-warped data with an analysis of the same data that have been warped using landmark registration to help determine if any differences found are more likely to be due to timing vs. magnitude?

Third, I noticed that there is now the ability to analyze data as a circular field. Would using this approach (circular field analysis) have any influence on the results presented in this paper?

In short, one of the appealing aspects of a spm approach to analyzing gait data was the ability to identify differences at localized areas of the gait cycle. In light of these recent results, I am trying to better understand what best practices might be moving forward.

Thank you very much for your time,

Eric

Hello Eric,

Thank you very much for these questions! Results interpretation is a very important but, in my view, an under-discussed topic. I'll take some time to respond relatively thoroughly, perhaps tangentially in places.

Note: I've added "A" and "B" labels to your comments for subsequent reference...

First, if I understand correctly, the take home message of the paper is that [A] because of differences in timing and/or magnitude it is not possible to use spm to determine which localized portion of the gait cycle are different between groups/conditions but rather [B] it can be used to determine if there are differences between the "whole trajectories" between groups/conditions. Is that a correct interpretation?

These interpretations are indeed correct, but there are some additional factors to consider that would introduce a touch of nuance. The easiest place to start is the null hypothesis, which pertains to the whole trajectory in SPM, and for a two-sample test is H0: mu_A(t) - mu_B(t) = 0, where t is time, and mu_A and mu_B are population means. Here are some things to consider for general interpretations of SPM results:

Population-level inference. The null hypothesis pertains directly to only the population means mu_A and mu_B. Therefore hypothesis testing results do not pertain directly to the observed sample means.
H0 is assumed to be true. For hypotheses of the form H0: mu_A(t) - mu_B(t) = c, where c=0, the hypothesis is that there is no difference between mu_A and mu_B, and the results pertain directly only to the case of no true effect. In other words, if H0 is true, what is the probability that a random process would produce the observed effect?
Trajectory-level tests. When the null hypothesis pertains directly to the whole trajectory (as in the H0 case above), and not to a specific portion of the trajectory, hypothesis testing results also pertain directly only to the whole trajectory, and not to a specific portion.

Points 1 & 2 are common to all classical hypothesis tests. Point 3 is unique to trajectory-level analyses.

Thus your interpretations are correct, but only insofar as they emerge from Points 1-3. Consider a variety of true effect cases:

[Case 1] A true effect exists at time=25%, and SPM results finds a significant effect at time=25%, and only this effect. In this case your [B] interpretation is correct. The [A] interpretation less accurate because SPM has, in this case, identified a true effect.
[Case 2] A true effect exists at time=25% but this time, due to random sampling peculiarities, SPM results find significant effects at both time=25% and at another time, e.g. time=75%. In this case, your [B] interpretation is still correct, and your [A] interpretation is more correct. In other words, the trajectory-level result is correct but one of the local results is not.
[Case 3] A true effect exists at time=25%, but SPM finds a significant result at only time=75%. In this case [B] is accurate, but only by accident, and [A] is accurate.
[Case 4] No true effect exists, and SPM finds no significant effects. In this case [B] is accurate, but [A] is less accurate.
[Case 5] No true effect exists, but SPM finds a significant effect. In this case [B] is not very accurate, and [A] is more accurate.

From this compilation of cases it can be seen that [B] is generally more often correct (4 of 5 cases).

All together this means that the interpretations need to be a bit nuanced, because they are limited both by what H0 actually means, and by the fact that we never know what the true effect actually is. In particular, if H0 does not explicitly and directly pertain to a specific part of the trajectory, then neither do hypothesis testing results.

The points above pertain to general SPM analyses, and not to the cited paper specifically. But from these points I think the cited paper's messages become clearer: most generally, data processing parameters can affect results. More specifically, cycle segmentation through landmark identification tends to more strongly affect conclusions regarding local regions than conclusions regarding the overall trajectory-level. One overall implication is that it is generally a good rule-of-thumb to conduct sensitivity analyses by systematically manipulating the data processing parameters over a feasible range of values.

So yes, I agree both of your points [A] and [B], but I think they need to be a bit more nuanced in order to be general interpretations.

Second, in the article there was some mention of curve registration or warping as a way to mitigate the effects of timing variability. I'm curious if a reasonable approach would be to follow-up an initial analysis of non-warped data with an analysis of the same data that have been warped using landmark registration to help determine if any differences found are more likely to be due to timing vs. magnitude?

Yes, I think this is a very good idea. In general this approach can be referred to as a "morphology + amplitude" approach. When the data are linearly registered, as is usually done in the literature, trajectory-level results embody both morphological (i.e., timing) and amplitude effects. Nonlinear registration, through landmark registration for example, tends to minimize morphological effects, and analyses of nonlinearly registered data tends to embody predominantly amplitude effects. If the a priori null hypothesis pertains only to amplitude effects, then nonlinear registration is probably a better choice. On the other hand, if the a priori null hypothesis does not pertain exclusively to amplitude effects, then it is probably a good idea to conduct both linear and nonlinear registration so that morphological and amplitude effects can be more clearly isolated.

The figures below contain data from Dorn et al. (2012) as reported in Pataky et al. (2015, Appendix G). Linearly and nonlinearly registered data are shown in Fig.1a and Fig.1b, respectively. Corresponding SPM results are shown in Fig.2. In this case nonlinear registration had little effect on the results, which suggests that the results are predominantly due to amplitude effects.

Dorn, TT, Schache AG, Pandy MG (2012). Muscular strategy shift in human running: dependence of running speed on hip and ankle muscle performance. Journal of Experimental Biology 215, 1944–1956. https://doi.org/10.1242/jeb.064527, Data: https://simtk.org/home/runningspeeds

Pataky, TC, Vanrenterghem, J, Robinson, MA (2015). Zero- vs. one-dimensional, parametric vs. non-parametric, and confidence interval vs. hypothesis testing procedures in one-dimensional biomechanical trajectory analysis. Journal of Biomechanics 48(7), 1277–1285. https://doi.org/10.1016/j.jbiomech.2015.02.051

fig_dorn_registered_curves

Figure 1: Anterior/posterior (AP) ground reaction forces during running and sprinting at different speeds. There are two observations for each speed. Data from Dorn et al. (2012).

fig_dorn_registered

Figure 2: Correlation between AP grounds reaction forces and running speed.

Third, I noticed that there is now the ability to analyze data as a circular field. Would using this approach (circular field analysis) have any influence on the results presented in this paper?

No, I wouldn't expect the results to change. Circular field analysis simply wraps the final point (time=100%) to the starting point (time=0%). This affects only cluster definitions (i.e., a cluster that touches both time=100% and time=0% would be regarded as a single cluster), but not the critical threshold. The results in this paper pertain to only threshold crossings, so the results wouldn't be affected.

In short, one of the appealing aspects of a spm approach to analyzing gait data was the ability to identify differences at localized areas of the gait cycle. In light of these recent results, I am trying to better understand what best practices might be moving forward.

I think it is fine to retain the perspective that SPM can identify localized effects, provided it is understood that (a) SPM is not testing specifically for those effects, and (b) those results might be false positives.

More generally, I think a key problem lies in the inherent weaknesses of exploratory designs. An exploratory design does not identify any specific effect in an a priori manner, so the study's conclusions must naturally be cautiously non-specific. It is much more scientifically and statistically powerful to make a specific a priori hypothesis and to test only that hypothesis. This too is however insufficient, and only through repeated experimentation can repeatedly emerging effects be regarded as likely true ones.

Moving forward I think the best practices would be to:

Conduct sensitivity analysis wherever possible, including linear/nonlinear registration, to ensure that results are robust to data processing assumptions
Acknowledge the inherent weaknesses of exploratory designs; if no specific hypothesis is formulated prior to an experiment, then the experiment's results are naturally also non-specific.
Acknowledge the limitations of classical hypothesis testing: hypothesis testing can indeed be a powerful tool, but hypothesis testing results are meaningful only insofar as the null hypothesis is meaningful.

Perhaps the first point is the easiest short-term practical step that can be taken. If sensitivity analyses are conducted more routinely in the literature I think the next two points would emerge naturally over time.

Todd

Todd,

Thank you for this detailed response. It is extremely helpful, and I appreciate you taking the time to put it together! It serves as a good reminder of the importance of developing specific a priori hypotheses.

With that said, I would be interested in your thoughts on if/how spm could be used in clinical situations which may not always take the form of well-defined prospective research studies with a priori hypotheses.

Scenario 1: After reading posts, #106, #113, #140, I was considering using spm to conduct pre-post comparisons of our clinical gait patients (within individual patients). We typically have 6-10 strides (measurements) at each time point (pre/post). It seems that this might be feasible, however without specific a priori hypotheses the results would be somewhat limited. Based on the prior discussion, the interpretation of the results would likely apply to the overall trajectory rather than local regions. Does that seem like an appropriate way to use spm to analyze and interpret these types of data?

It could be argued that we should have a priori hypotheses about expected changes in gait mechanics due to the treatment/surgical intervention(s) or our expectations related to the natural history of various patient populations. However, I haven’t seen that approach implemented in clinical gait practice on a regular basis. It would essentially be treating each patient as single subject experimental design and developing patient specific hypotheses based on clinical presentation and desired treatment outcomes.

Scenario 2: Similar to Scenario 1, I’m also trying to determine if it is appropriate to use spm to compare a single subject’s gait kinematics/kinetics to those of a control group?

For example, when considering a patient’s initial gait analysis, we are often trying to determine if and how the patient deviates from ‘normal’. This falls more into the exploratory analysis ‘bucket’ that you referred to. We typically use aggregate measures such as the Gait Deviation Index and compare patient time series data to that of controls using 1 or 2 standard deviations as a way to assess how normal (or abnormal) their gait is.

SPM seems like it could be a potentially useful tool, but based on the discussion about the nature of exploratory analyses and issues related to curve segmentation/registration I’m not sure? Seems that it would be easy to misinterpret (or misrepresent) the results.

Would a more appropriate approach in this scenario be to use some type of bootstrapping method as described by Lenhoff? (Lenhoff MW, Santner TJ, Otis JC, Peterson MGE, Williams BJ, Backus SI. Bootstrap prediction and confidence bands: a superior statistical method for analysis of gait data. Gait & posture 1999; 91:10-7.)

Thank you again for sharing your expertise and I hope I haven’t strayed too far from the intent of this forum.

Eric

Hi, sorry for the delay!

Based on the prior discussion, the interpretation of the results would likely apply to the overall trajectory rather than local regions. Does that seem like an appropriate way to use spm to analyze and interpret these types of data?

I think this phrasing is too black-and-white, and that reality is grayer. Yes, the results apply specifically only to the overall trajectory. However, this does not imply that identified local effects are not real effects. In my opinion it is important to avoid reading too much in to a single set of results.

It could be argued that we should have a priori hypotheses about expected changes in gait mechanics due to the treatment/surgical intervention(s) or our expectations related to the natural history of various patient populations. However, I haven’t seen that approach implemented in clinical gait practice on a regular basis. It would essentially be treating each patient as single subject experimental design and developing patient specific hypotheses based on clinical presentation and desired treatment outcomes.

You have described a theoretically strong scenario very well, and I agree that this approach is not common. The problem is: predictive experimentation is powerful but impractical. Exploratory experimentation is the opposite: practical but comparatively weak. A nice mid-point between these two extremes is Bayesian inference, which has the practical advantage of not requiring a specific prediction, but also has generally stronger conclusions than hypothesis testing. However, Bayesian inference introduces other complexities, and since spm1d does not currently support Bayesian inference this goes beyond the scope of this forum.

Similar to Scenario 1, I’m also trying to determine if it is appropriate to use spm to compare a single subject’s gait kinematics/kinetics to those of a control group?

This is a tricky issue, as it is applicable to all hypothesis testing techniques, not just SPM. In this case I'd be tempted to use a one-sample test where the null hypothesis is the control mean, but I am uncertain whether this is the best solution. I suspect that the medical literature has thorough considerations of the N=1 case.

SPM seems like it could be a potentially useful tool, but based on the discussion about the nature of exploratory analyses and issues related to curve segmentation/registration I’m not sure? Seems that it would be easy to misinterpret (or misrepresent) the results. Would a more appropriate approach in this scenario be to use some type of bootstrapping method as described by Lenhoff?

This is a very difficult question because it asks how hypothesis testing should be used in clinical practice. This problem is very large and multi-faceted, and I fear that it goes well beyond the scope of this forum. I'll therefore limit my response to your question regarding Lenhoff et al. (1999)...

Here are a few theoretical considerations:

Confidence bands are equivalent to one-sample hypothesis tests.
Bootstrapping is a nonparametric technique whose results are by definition equivalent to parametric results for large sample sizes and when the data are normally distributed.
Results from the bootstrapping approach in Lenhoff et al. (1999) would therefore converge to SPM's results for large sample sizes and normally distributed data.

From this perspective, the proposed bootstrapping approach would be useful primarily when both (i) the sample size is at least moderately large, and (ii) the data are non-normally distributed. Other than this case I don't think there is much theoretical advantage.

Todd

0todd0000 / spm1d

The effects of data alignment on SPM results #163