0todd0000 / spm1d

One-Dimensional Statistical Parametric Mapping in Python
GNU General Public License v3.0
61 stars 21 forks source link

Bootstrap resampling method and SPM analysis #209

Closed TakumaInai closed 1 year ago

TakumaInai commented 2 years ago

Hi,

I want to ask you how we can apply spm analysis to bootstrap resampling method.

For example, there are data of ankle joint angles during gait (control group: 50 subjects, patient group: 50 subjects). We conduct spm analysis, and I want to apply spm analysis to bootstrap resampling method and calculate 95%CI of t value in each phase.

For this purpose, in ith (e.g., i =1) of bootstrap resampling (e.g., the number of resampling is N = 1000),

tvalue_upper = (e.g.,) 4.5 tvalue_lower = (e.g.,) -4.5 tvalue_40%gaitcylce = (e.g.,) -13

re-calculation (correction) tvalue_upper' = 1 tvalue_lower' = -1 tvalue_40%gaitcycle' = -2.89

and, in i+1th of bootstrap resampling,

tvalue_upper = (e.g.,) 4.3 tvalue_lower = (e.g.,) -4.3 tvalue_40%gaitcycle = (e.g.,) -18

re-calculation (correction) tvalue_upper' = 1 tvalue_lower' = -1 tvalue_40%gaitcycle' = -4.19

After all these processing (1000 times),

We estimate 95%CI from 1000 t values. For example, 95%CI of t value is [-2.5, -5.0], and the calculated 95%CI is not overlap between -1 to 1. Therefore, the results of difference at 40% of gait cycle is robustness.

Are my idea (calculation process for bootstrap resampling using spm analysis) and interpretation appropriate or inappropriate? If my idea is inappropriate, would you tell me how we can exhibit robustness of spm analysis using bootstrap resampling method.

0todd0000 commented 2 years ago

Your ideas sound fine, but if you swap the lower and upper values, then the distribution will be symmetrical, so you'd only need to save one of the values (either upper or lower) when building the 1000 t-value distribution. I don't think 40%gaitcycle is needed because the bootstrap distribution should be calculated from the maximum t-value across the 1-D domain, regardless of where that maximum t-value appears within the domain.

As an alternative, you may want to consider using a similar approach that is already implemented in spm1d:

spm1d.stats.ci_onesample

This routine uses permutation rather than bootstrapping to calculate CIs, as described in Nichols and Holmes (2002). The permutation results should be nearly identical to bootstrap results, with likely negligible numerical variation. Please find an example script here:

./spm1d/examples/nonparam/1d/ex_ci_onesample.py


Notes:

(1) I am not certain whether "95%CI of t value in each phase" refers to subphases of the domain, but just in case it does, please note the following: the permutation procedure above uses the maximum t-value (across the domain) to construct CIs. If your null hypothesis pertains to the whole domain, then you should use the maximum t-value across the whole domain (rather than within subphases) when calculating CIs.


(2) There are different CI procedures for paried and two-sample designs that you may also want to consider:

./spm1d/examples/nonparam/1d/ex_ci_pairedsample.py ./spm1d/examples/nonparam/1d/ex_ci_twosample.py

These generally produce different results than one-sample CIs, and should generally be matched to the experimental design to avoid incorrect interpretations.


(3) You may also want to consider parametric CIs, as demonstrated in the scripts below.

./spm1d/examples/stats1d/ex_ci_onesample.py
./spm1d/examples/stats1d/ex_ci_pairedsample.py
./spm1d/examples/stats1d/ex_ci_twosample.py

These can be calculated much more quickly, and are usually quite similar to the corresponding nonparametric CIs (i.e., permutation or bootstrap CIs).


Although bootstrapping was not considered in Pataky et al. 2015, this paper considers robustness through the permutation method instead. To demonstrate robustness for your own datasets, I suggest calculating CIs both parametrically and nonparametrically. If the results qualitatively agree, it would suggest that the parametric approach's assumption of normality is a reasonably robust one for those particular datasets / populations.

TakumaInai commented 2 years ago

Dear Prof. Todd,

Thank you for your quick reply. I really appreciate your response. Based on your comments, I will check the some example codes and try the analysis process. Thank you!

Sincerely, Takuma Inai

TakumaInai commented 2 years ago

Dear Prof. Todd,

Based on your comments, I read the articles and confirmed the script codes (especially, ./spm1d/examples/nonparam/1d/ex_ci_twosample.py).

Now, I want to compare (e.g.) ankle joint angle during walking between two groups (e.g., young adults and older adults). I would like to ask you some questions, and let me clarify some points.

[1] I understand that the script code (./spm1d/examples/nonparam/1d/ex_ci_twosample.py) performs permutation test. Probably, I will confirm robustness using this sample code. I ask a quetion about number of subject in each group. Generally, in case of independent t-test, appropriate the number of subject in each group is approximately 26 (if Cohen's d = 0.8) (I calculated this in R language. power.t.test(n=NULL,sig.level=0.05,power=0.8,type='two.sample',delta=0.8)). However, If I perform permutation test using spm analysis like ex_ci_twosample.py, how many subjects are appropriate? (for permutation test, is it better to have more subjects in each group?)

[2] Based on your comments, to demonstrate robustness for my datasets, I will calculate CIs both parametrically and nonparametrically. Are parametric and nonparametric results similar in most cases? I executed the code (./spm1d/examples/nonparam/1d/ex_ci_twosample.py) and plot results of parametric and nonparametric results, however, it looks very similar. If different results are obtained, how can I interpret? (e.g., parametric CI: h0reject=>TRUE, non-parametric CI: h0reject:FALSE)

[3] If comparing permutation test and my idea (bootstrap resampling method), which one produces more robust results?

[4] If ex_ci_twosample.py, can I interpret that there is (robust) significant difference at approximately 22-23% phase between groups.

I look forward to your reply. Sincerely, Takuma Inai

0todd0000 commented 2 years ago

[1] A key problem here is that power and sample size calculations are different for 0D and 1D data. To calculate sample sizes for 1D data, you generally need to specify a 1D effect, and the resulting sample sizes are usually a bit bigger than for 0D data. Please find details in the power1d package and in this sample size calculation example. The power1d package can be used to specify both normal and/or non-normal noise, so can be used for both parametric and nonparametric evaluation cases.

[2] Parametric and nonparametric CIs are indeed quite similar for most datasets we've seen. If the data are in fact normally distributed, then nonparametric CIs are expected to converge exactly to parametric CIs as the sample size grows. Thus similarity between the parametric and nonparametric CIs suggests that the parametric procedure's assumption of normality is a reasonable one. If the results are qualitatively different, I would recommend reporting both and stating that the results should be interpreted cautiously. They are usually only different for relatively small effects. For example, the parametric approach might yield p=0.06 and the nonparametric approach might yield p=0.04. Does this mean that the results disagree? Not necessarily. Instead these results could imply that the effect is simply too small to get a consistent result when different approaches are applied. So I recommend reporting a result like this as a "relatively small effect that should be interpreted cautiously".

[3] As sample size grows, there is effectively no difference between permutation and bootstrap methods. When sample sizes are smaller, there may be some divergence between the methods. But when sample sizes are small, the results should be interpreted cautiously regardless of the technique, so there is little practical difference between the two techniques. I don't think it's possible to conclude that one is more robust than the other because robustness depends on a variety of factors, and robustness pertains to the infinite set of all similar experiments. When analyzing a specific dataset, it is usually sufficient to apply various methods and report the differences if the methods' results qualitatively differ.

[4] Usually "robust" pertains to the theoretical case of an infinite number of datasets, often involving a range of non-ideal variation including normal and non-normal noise, outliers, etc. Contrastingly, a "significant difference" pertains to a specific effect observed in a single dataset. Thus statistical methods (and not specific results) should be regarded as "robust" or not robust. So I would interpret that result simply as a significant difference. However, please note: if your goal is to compare two populations, then I would recommend using a two-sample hypothesis test. CIs can be very misleading, especially if constructed without appropriate regard for the experimental design (i.e., one-sample CIs should NOT be used for a two-sample comparison).

TakumaInai commented 2 years ago

Dear Prof. Todd,

Thank you for your reply. I really appreciate your comments and 'spm1d'! I will use 'spm1d' in my future study. Thank you!

Sincerely, Takuma Inai