0todd0000 / spm1d

One-Dimensional Statistical Parametric Mapping in Python
GNU General Public License v3.0
61 stars 21 forks source link

ANOVA - Post-hoc t-test - question supra-threshold cluster #148

Closed PaulAndreD closed 3 years ago

PaulAndreD commented 3 years ago

Dear Todd,

Can I please ask your help to provide an answer to a reviewer ?

I performed a cross-sectional study to assess if there was a difference in foot mechanics of patients with post sprain ankle osteoarthritis (N=14) compared to patients with post fracture ankle osteoarthritis (N=15). A convenience sample of 29 subjects with post-traumatic ankle osteoarthritis and 15 asymptomatic control subjects were studied. A multi-segment foot model was used to calculate intrinsic foot joint kinematics and kinetics during the stance phase of the gait cycle.

I used a similar approach that you used in your paper De Ridder 2013. Initially, one-way ANOVA (spm1d.stats.anova1) over the normalised time series was used to establish the presence of any significant differences between the three groups. If statistical significance was reached, post hoc t-tests (spm1d.stats.ttest2) over the normalized time series were used to determine between which groups significant differences occurred.

The reviewer (cfr comment from reviewer) seems to doubt about the computing of the probability of the specific supra threshold regions.

Reviewer comment : The one part of your Statistical analysis that I didn't follow is when you state "Individual probability values were then calculated for each supra-threshold cluster that could have resulted from an equally smooth random process". I suspect that ties in with the arrows in Figures 3 to 5 indicating p<0.001. Those values are almost all certainly false, and they are certainly false throughout the entire shaded areas. After all, at the boundaries of these regions significance at the 5% level is only just reached. To show areas where p<0.001 it would be necessary to run SPM again over the full gait cycle, with the significance level set to 0.001.

Should I run SPM again over the full gait cycle instead of the stance phase with a significance level set to 0.001 as the reviewer is suggesting ? To my opinion, (I may be wrong), but this should technically not change the statistical results ? The supra-threshold cluster with p < 0.001 will still be present at the same instants of the gait cycle.

Would you have any recommendation on which additional details I could provide the reviewer about the calculation and p-values of the supra threshold clusters?

I thought I followed the general indications that were suggested at the SPM1d workshop I followed, and the examples of previously published articles but this appear not to be sufficient.

Many thanks in advance,

Paul-André

0todd0000 commented 3 years ago

Hi Paul-André,

Regarding stance phase vs. full gait cycle: If your hypothesis was limited to stance phase, then all analyses should also be limited to stance phase. Since stance phase involves foot loading, and swing phase does not, it could be argued that stance phase is more relevant to injury mechanics than swing phase, so I think it makes sense to limit analyses to stance phase. Presuming that the Introduction does not explicitly justify consideration of just stance phase (as opposed to the full gait cycle), then I suggest adding a justification to the Introduction, perhaps with literature references that show why stance phase is more relevant to injury mechanics than swing phase.

Regarding p < 0.001: Note that the p value decreases as the temporal extent of the cluster increases (Friston et al., 1994; Worsley, 1994). If the maximum cluster value just touches the critical threshold, the cluster probability is alpha, and as the cluster grows in temporal extent, the probability decreases, and it can easily decrease to p<0.001, regardless of the cluster height. The cluster height (e.g., maximum height, average height, etc.) is irrelevant to this probability. I would respond by stating that other, somewhat more intuitive probabilities exist, like the probability associated with the cluster integral (Zhang et al. 2009), but the literature typically uses the extent probability (Friston et al. 2007), likely because the extent probability is easier to calculate, and is implemented in all open-source SPM packages. You might also want to stress that the null hypothesis rejection decision (i.e., whether the test statistic trajectory crosses the critical threshold) is the main result, and that this result's key probability is alpha. Cluster-level probabilities are similar to post hoc analyses; they serve primarily to qualify the main test result.

References

Friston, K., Worsley, K., Frackowiak, R., Mazziotta, J., & Evans, A. (1994). Assessing the significance of focal activations using their spatial extent. Human Brain Mapping, 1(3), 210–220.

Friston, K., Ashburner, J., Kiebel, S., Nichols, T., & Penny, W. (2007). Statistical Parametric Mapping: The Analysis of Functional Brain Images‎. Elsevier, London.

Worsley, K. (1994). Local maxima and the expected Euler characteristic of excursion sets of χ 2, F and t fields. Advances in Applied Probability, 26(1), 13–42.

Zhang, H., Nichols, T. E., & Johnson, T. D. (2009). Cluster mass inference via random field theory. NeuroImage, 44(1), 51–61.

PaulAndreD commented 3 years ago

Dear Todd,

Many thanks for your instructive feedback and the provided arguments for the reviewer.

I would like to ask you a last question about the use or not of a Bonferroni correction for the research design from my first post.

3 different groups : post-fracture, post-chronic ankle instability, control

kinematics : ankle (X,Y,Z); Chopart (X,Y,Z); Lisfranc (X,Y,Z) moments : ankle (X,Y,Z); Chopart (X,Y,Z); Lisfranc (X,Y,Z) power : ankle, Chopart, Lisfranc

This is a total of 21 variables (may be too many ...)

I read several of your papers (Naouma & Pataky 2019; Pataky JoB 2013; and from Mark Robinson) in order to grasp the problematic around SPM's false positive control during multiple hypothesis testing.

In your paper of De Ridder 2013 : They have a similar research design as my paper. They have the kinematics of 5 joints (X,Y,Z) : no alpha correction

In the papers of Mark Robinson, I can see that he corrects the alpha with the number of comparisons.

Do you have a good rule of thumb that I can follow ?

Would you advice to replace the spm1d.stats.ttest2 by the spm1d.stats.hotellings2 except for joint power as it is a scalar quantity ?

Many thanks in advance,

Paul-André

0todd0000 commented 3 years ago

Yes, I would certainly suggest starting with a multivariate analysis. In this case I'd suggest starting with one-way MANOVAs (spm1d.stats.manova1), one for each XYZ variable, to test for differences across the three groups, and also using a Bonferroni correction across the tests.

One good rule of thumb is to start by applying a Bonferroni correction across tests. This is the most severe correction, and is also almost always overly conservative, so if results survive this correction, they will by definition survive any other correction procedure, including a procedure that more correctly controls alpha.

However, interpretation problems will arise if the results do not exceed the Bonferroni threshold, especially if they are close to the threshold. In this case they would likely exceed the critical threshold from a more accurate, and less conservative correction procedure.

After the main MANOVA tests, I think any post hoc procedure is fine (including Hotelling's T2 tests and t tests), provided the results do not disagree with the main MANOVA results. In my view, post hoc tests simply qualify the main results. The main results (and the models from which they were derived) most closely represent the experiment that was conducted. Post hoc analyses do not represent the experiment; instead they attempt to explain the main results.

A more general rule of thumb is to analyze fewer variables. As the number of experimental variables increases, an experiment's statistical power decreases.

PaulAndreD commented 3 years ago

Dear Todd,

Many thanks for sharing your knowledge and expertise.

It is greatly appreciated.

Thanks again,

Paul-André

PaulAndreD commented 3 years ago

Dear Todd,

First of all, best wishes for a happy and prosperous 2021.

We applied your suggestions to our manuscript and these modifications were greatly appreciated by the reviewer who received the task from the journal to check our statistical approach. Many thanks for your help !

However, as no statistical differences were found between our pathological groups (as we strongly believe that both groups are adopting a similar antalgic walking strategy to prevent loading through their painful joint), the second reviewer asked us to provide evidence that a type 2 error does not exist when reporting our findings.

Does the SPM toolbox provide the tools needed to reply to the request of reviewer #2 ? If yes, could you advice us how to proceed ?

Many thanks in advance,

Paul-André

0todd0000 commented 3 years ago

The reviewer's request is a difficult one because it asks for post hoc power analysis. There are two separate issues to consider...

Below α and β are the Type I and Type II error rates, respectively, and "power" is (1 - β).

  1. Validity of post hoc power analysis

It is generally not valid to conduct post hoc power analysis unless one has conducted formal a priori power analysis. This is partly my opinion, but is also argued in many papers. A search for "post hoc power" will yield many results with opinions on both sides. As the debate applies to this situation, I'd say that far more papers argue that the proposed analyses are not valid. Here is the basic reason why:

α pertains not to a specific experiment but to an infinite number of identical experiments. Thus a statement like the following is non-sensical: "the Type I error rate of this study was α". It is non-sensical because α does not control errors in a single experiment, it instead controls errors across an infinite number of experiments.

Identically, it is non-sensical to say that "the Type II error rate of this study was β". Like α, β is also relatively meaningless for a single experiment. By extension, power (1 - β) is also relatively meaningless for a single experiment. The idea of a priori power analysis is to simultaneously control both α and β, but again not for a single experiment, and instead for the infinite set of identical experiments. Only through a priori power analysis can simultaneous α and β control be achieved.

As an illustration of this problem, consider an experiment for which you set the cut-off for significance at something like "10 N" if measuring force or "5 deg" if measuring joint angles. In this case a reader could ask you what evidence you have against Type I error. If you didn't specify an a priori α, and didn't calculate the appropriate sample size in able to detect a specific effect like 10 N or 5 deg, then it would be difficult to respond to that question. I believe that the situation is similar for requests of post hoc power analysis: without a priori power analysis, it is difficult to respond to power-related questions, because power was not controlled in an a priori manner.

The arguments you'll find in the literature are more nuanced and also more detailed than this, but this is the fundamental reason why post hoc power analysis can generally be considered inappropriate for a single experiment.

  1. Power analysis for 1D data

Power analysis is indeed possible for 1D and nD data. spm1d supports power analysis in a package called power1d: https://spm1d.org/power1d/ but this package is only available in Python. Examples of power analysis for one-sample, two-sample and regression designs are demonstrated here: https://spm1d.org/power1d/Examples/PowerAnalysis.html

To answer your question:

Does the SPM toolbox provide the tools needed to reply to the request of reviewer 2 ? If yes, could you advice us how to proceed ?

spm1d does indeed provide the tools necessary to conduct power analysis for 1D data, but this does not necessarily mean that these tools are appropriate to use in this case. Since post hoc power analysis is generally invalid without corresponding a priori power analysis, it would be difficult to convincingly provide evidence against Type II error unless you had also conducted power analysis before conducting the experiment.

I would think that the best way to respond would be:

PaulAndreD commented 3 years ago

Dear Todd,

Many thanks for sharing your knowledge and expertise.

It is greatly appreciated.

Thanks again,

Paul-André