Minimum number of participants for normality and t tests

0todd0000 / spm1dmatlab

One-Dimensional Statistical Parametric Mapping in Matlab.

GNU General Public License v3.0

28 stars 13 forks source link

Minimum number of participants for normality and t tests #154

Closed 0todd0000 closed 2 years ago

0todd0000 commented 3 years ago

(redirected from #153)

Is there a minimum limit for the number of participants in the trial to successfully run both the normality and or t-tests using SPM?

0todd0000 commented 3 years ago

There are three considerations:

The only formal way to calculate the minimum number of participants is with a power analysis
There is no statistical limitation to the minimum number of participants. SPM (and other hypothesis testing techniques) can be validly applied to arbitrary sample sizes. However, small samples may be under-powered and large samples may be over-powered, so it is generally important to consider power when choosing a sample size.
As a general rule-of-thumb, I think it's fine to follow general literature convention, which generally use at least 5 participants. Using 8 or more is also common. Addressing this issue any more formally requires power analysis.

rude10 commented 3 years ago

Hi Todd

Thanks for the response. I was also referring to software or code thresholds since I ran spm.normality.xx using Matlab for a small sample size of 4 persons and it returned an error message saying it needs a minimum of 8 observations

0todd0000 commented 3 years ago

Understood, sorry for misinterpreting your question.

Normality tests in spm1d suggest 8 or more observations because that appears the lower limit of validity for its currently implemented normality testing procedure.

Regardless, in general I'd suggest not conducting explicit normality tests, and instead conduct both parametric and nonparametric tests. If the results qualitatively agree, then the parametric procedure's assumption of normality is a reasonable one.

rude10 commented 3 years ago

Thanks much Todd.

So just to be clear if the nonparametric shows a p value greater than 0.05 at one part of the curve (instead of less) then is the assumption that during that portion the data sets analysed were not significantly different and the other part it was?

0todd0000 commented 3 years ago

That may be correct, but it is likely not correct in all cases.

A result like that could instead emerge from a variety of factors like sample size, for example, where there is insufficient data to adequately represent the distribution nonparametrically.

In general, if parametric and nonparametric results differ, I'd suggest reporting both sets of results. A disagreement implies that the results are sensitive to analysis choices, and thus that the result should be interpreted more cautiously than if both procedures yielded the same result.

rude10 commented 3 years ago

Thanks Todd, and I surmise that I can assume that the non-parametric test assumes the same usual lower limits of between 5-8 persons.

rude10 commented 3 years ago

Also, is there a reference I can look at where you suggest doing both tests and a comparison in the non-parametric scenario? Like a research paper or in the documention? Also in all the examples shown the shaded area was where p was less than threshold values. Any examples on where the shaded area is greater than and associated explanation? Asking since also sharing process with other colleagues.

0todd0000 commented 3 years ago

Thanks Todd, and I surmise that I can assume that the non-parametric test assumes the same usual lower limits of between 5-8 persons.

Informally: yes, this is a good guideline. But note that there are no formal lower limits on sample size.

Also, is there a reference I can look at where you suggest doing both tests and a comparison in the non-parametric scenario? Like a research paper or in the documention?

Redirected to: #155 (We've drifted away from this issue's focus on sample size so I've created a new issue.)

Also in all the examples shown the shaded area was where p was less than threshold values. Any examples on where the shaded area is greater than and associated explanation? Asking since also sharing process with other colleagues.

Redirected to: #156