0todd0000 / spm1d

One-Dimensional Statistical Parametric Mapping in Python
GNU General Public License v3.0
60 stars 21 forks source link

How to control family-wise Type I error rate & set P-value in SPM one-way repeated measures ANOVA #272

Closed HelloBeer closed 2 months ago

HelloBeer commented 8 months ago

How to control family-wise Type I error rate & set P-value in SPM one-way repeated measures ANOVA

By visiting this URL:https://github.com/0todd0000/spm1d/issues/210 Am I to understand that in the spm1d.stats.anova2onerm analysis, SPM has corrected the family-wise Type I error rate via Random Field Theory (RFT)? If so, has family-wise Type I error rate been corrected in spm1d.stats.anova1rm in the same way. If post hoc analyses are continued, it is the alpha that is being adjusted to control for Type I error inflation via the Bonferroni correction.

In fact, there is no "statistically significant difference in p-values" set in the spm1d.stats.anova1rm analysis. First, the null hypothesis that the independent variables (3) have no effect on the dependent variables (2) is formulated, using random field theory to correct for family-wise Type I error rate. then the test statistics (SPM{F} and SPM{t}) are calculated, (F and t statistics are qualitatively the same as the effect sizes, and can be used as indicators of practical significance: https:// github.com/0todd0000/spm1d/issues/54); then critical thresholds (F or t) are defined based on random field theory, and when the test statistic trajectory exceeds the critical thresholds, the original hypothesis is rejected and the difference is considered statistically significant. Finally, the p-value was calculated for each suprathreshold cluster. If the main effect is significant, paired SPM{t} tests with Bonferroni corrections (the purpose of the corrections is to adjust for alpha and control for Type I error inflation) are used to determine the location of the difference. Am I understanding this correctly?

If it is necessary to use P-values as a criterion for "statistically significant differences" and to control for family-wise Type I error rate (maintaining a family-wise error rate of α = 0.05), then I note that in the "Vector field statistical analysis of kinematic and force trajectories: Table1" & "The effect of approach velocity on pelvis and kick leg angular momentum conversion strategies during football instep kicking" are divided the number of dependent variables by 0.05 (the number of SPM ANOVAs performed on dependent variables), and I'm having a hard time figuring out why this was done. I would like to have your answer.

0todd0000 commented 8 months ago

I agree with your points, but some minor phrasing tweaks would more accurately convey the calculation steps involved in SPM analyses...



Am I to understand that in the spm1d.stats.anova2onerm analysis, SPM has corrected the family-wise Type I error rate via Random Field Theory (RFT)?

Yes, this is correct, but I would rephrase slightly: RFT is used to control the Type I error rate via a smoothness-dependent correction for multiple comparisons. There is no corrected Type I error; the error rate remains at alpha.



has family-wise Type I error rate been corrected in spm1d.stats.anova1rm in the same way

Yes.



If post hoc analyses are continued, it is the alpha that is being adjusted to control for Type I error inflation via the Bonferroni correction.

Post hoc analyses are not yet supported directly in spm1d. One reason is that it is difficult --- or perhaps even impossible --- to achieve precise agreement between post hoc tests and the main ANOVA test.



In fact, there is no "statistically significant difference in p-values" set in the spm1d.stats.anova1rm analysis.

I don't quite understand this point because p-values are usually used to determine significance. I am unaware of any SPM procedure which directly compares p-values.



the null hypothesis that the independent variables (3) have no effect on the dependent variables (2) is formulated...

That is essentially correct. For the case of ANOVA the null hypothesis is more specific: equivalent group means. For RM-ANOVA the null hypothesis is: equivalent mean pairwise differences.



using random field theory to correct for family-wise Type I error rate. then the test statistics (SPM{F} and SPM{t}) are calculated

This is also essentially correct but I think it is easier to consider the calculation process in the reverse order. Sample means and standard deviations (SDs) can of course be calculated independent of RFT, and since test statistics are effectively just ratios of means and SDs, test statistics too can be calculated independent of RFT. So I think it may be more accurate to say that the test statistic is calculate first. RFT is used only at the inference stage, when one aims to make probabilistic conclusions regarding an observed test statistic.



then critical thresholds (F or t) are defined based on random field theory, and when the test statistic trajectory exceeds the critical thresholds, the original hypothesis is rejected and the difference is considered statistically significant.

Yes, I agree with this interpretation with one minor exception: I would replace "defined" with "calculated". The user defines just the Type I error rate (alpha). Then RFT is used to calculate critical thresholds based on the selected alpha value.



Finally, the p-value was calculated for each suprathreshold cluster.

Yes, this too is correct. However, it is important to distinguish different levels of inference. There are generally three levels of inference in SPM analyses:

  1. Domain level (or "whole trajectory" level): this level pertains to the domain as a whole, and effects may occur anywhere within the domain. This is the level at which Type I error is controlled. It is also the level to which the critical threshold pertains.
  2. Set level: following domain-level inference, set level inference represents the probability that C clusters with a minimum breadth of K would occur when the null hypothesis is true.
  3. Cluster level: also following domain-level inference, and at a lower level than set-level inference, this inference level represents the probability of observing a cluster with a breadth of K when the null hypothesis is true.

There is in fact a fourth level: point level inference, where a "point" is a single domain location. However, point-level inference is rarely conducted, primarily because RFT's point-level p-values are generally inaccurate when p is large.

Thus your comment: "Finally, the p-value was calculated for each suprathreshold cluster." is indeed correct, but it is important to understand that this point pertains to only cluster-level inference. Generally domain-level inference is conducted first, as this level corresponds directly to the null hypothesis. Set- and cluster-level inference are best regarded as supplemental probabilistic details that are one-step removed from the actual hypothesis test.



...divided the number of dependent variables by 0.05...

If I am not mistaken this was done for post hoc analysis on individual vector components where the main test was a multivariate (vector) test. This approach is a Bonferroni-like correction for multiple post hoc tests, but it is non-ideal for several reasons, one of which is that post hoc tests on multivariate data are generally not conducted on the original components. Regardless, as far as I know there is no robust post hoc procedure that will yield precise probabilistic agreement between main and post hoc SPM tests, so I think it is most accurate to regard SPM post hoc tests as approximate qualifiers of the main test's results.