0todd0000 / spm1dmatlab

One-Dimensional Statistical Parametric Mapping in Matlab.
GNU General Public License v3.0
28 stars 13 forks source link

Number of interations in small samples #53

Closed johannesfunken closed 7 years ago

johannesfunken commented 7 years ago

Dear all ;-), we have a pretty cool data set with two groups. Only problem is the two groups are pretty small and also have different sizes. Group1:n=3, and Group2: n=7.

Now we would like to conduct spm an see in which part the the mean curves are different. We started with a non-parametric t-test. However, even though the curves are very different are SDs are not to high no difference was detected. After trying we found that lowering the number of interations would show differences as expected.

  1. Question: What is a good way to set number of interations? How does it related to sample sizes?

Of course we know t-test is not a good option for samples sizes that small. So we also tried a ci_twosample. However, we figured out that with this tool we only get an statement about if or if not the curves are different.

Question 2: Is there a way to use ci_twosample but still get information about the location of the differences?

Thank you very much in advance for your time. Greetings from Cologne Jo

0todd0000 commented 7 years ago

Hi Jo,

Question: What is a good way to set number of interations? How does it related to sample sizes?

A good rule-of-thumb is: (1) Use at least 10,000 iterations, with 100,000 giving more accuracy, when the total possible number of iterations is greater than 100,000, or otherwise (2) Use the maximum number of iterations, and always (3) Verify results by changing the number of iterations and selecting different random number generator (RNG) states. If results are sensitive to changes in the number of iterations and/or to changes in RNG state, it would be best to avoid conclusions that the data provide evidence either for or against the existence of an effect.

When sample sizes are small neither the mean nor the variance can be estimated accurately. Thus, while lowering the number of iterations may yield significance for one ore more specific combinations of iterations and RNG states, when sample sizes are small you will likely see wildly fluctuating results when you change the number of iterations and/or the RNG state.

Question 2: Is there a way to use ci_twosample but still get information about the location of the differences?

Confidence intervals (CIs) are equivalent to t tests; their probabilistic meanings are identical, and location information is identical. A CI is simply a t test result projected back on to the original sample mean(s), based on the sample standard deviation(s). If a particular location reaches significance for a two-sample t test that result will also be present in the CIs. However, note that CI interpretation is much more complex than hypothesis test interpretation because CI interpretation requires explicit specification of both a datum (i.e. Group 1 mean vs. Group 2 mean) and a significance criterion (i.e. the bottom of one CI cloud reaching the other mean vs. CI cloud divergence). Due to these complexities CI interpretations are often ambiguous. Moreover CIs do not generalize to arbitrary experimental designs like regression. In general I'd recommend using hypothesis tests rather than CIs because hypothesis testing results have an equivalent interpretation across all experimental designs. Namely: the probability that random data would produce a test statistic which reaches the critical threshold is alpha. It is not possible to write a similar generic interpretation for CIs due to their datum and significance criterion ambiguity.

Cheers from Nagano!

Todd

johannesfunken commented 7 years ago

Dear Todd, thank you so much for your very detailed answer and explanations. If I read it right, for me it won't make sens to use spm, as my groups are very small and sized differently as well. Playing around with number of iterations also showed that statistic result would be very sensitive to those changes.

All the best, Jo

0todd0000 commented 7 years ago

Hi Jo, I agree. I'd add that this sample size issue is not specific to SPM. I am unaware of any method that can make solid population-level inferences based on very small sample sizes. Todd