0todd0000 / spm1d

One-Dimensional Statistical Parametric Mapping in Python
GNU General Public License v3.0
61 stars 21 forks source link

Testing variance vs. tests of variance? #208

Closed 0todd0000 closed 1 year ago

0todd0000 commented 2 years ago

(This is paraphrased from an email discussion)

Imagine a simple experiment involving:

Questions:

  1. Is it valid to conduct a two-sample test on the s_i values to compare SDs in the two groups?
  2. If yes to Q1, is this different than a two-sample equality of variance test (comparing S_A and S_B)?
0todd0000 commented 2 years ago

Answers:

  1. Yes, this is valid. The s_i values are regarded as the dependent variable, and a normal two-sample test is usually fine. However, one may need to check the distributions. SD values cannot be less than zero, so if many SD values are close to zero it is possible that the distribution may be non-normal.

  2. Yes, the tests are different. The equality of variance test tests the hypothesis that the population variances are identical. A two-sample test on the s_i values can be related, but since the s_i values represent within-subject variance, the two tests can yield very different results. Consider the artificial data in the tables below.

Group A:

Subj
1 205 201 202 206 205
2 200 204 202 200 201
3 192 201 202 197 206
4 200 201 197 194 198
5 196 195 194 205 198
6 197 201 198 196 199
7 197 198 197 194 200
8 202 200 203 196 201

Group B:

Subj
1 390 409 398 398 404
2 403 414 397 403 391
3 385 400 398 415 414
4 401 412 412 396 396
5 395 387 407 383 397
6 404 400 403 393 396
7 395 383 404 390 400
8 393 391 394 396 400



These data were generated using the Python script below. The true population variances SA and SB values are: SA = SB = 5, and the true s_i values are 3 and 10, for groups A and B, respectively. For these data, the Levene test for equal variance yields p=0.152, and a two-sample t test on the s_i values yields p=0.001. Clearly equal-variance tests and test of variances (SDs) are considering different aspects of variability; the former tests SA vs. SB and the latter tests s_i.



import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# specify sample and population parameters
N   = 8     # number of subjects
n   = 5     # number of measurements per subject
SA  = 5     # true between-subject SD, Group A
SB  = 5     # true between-subject SD, Group B
sA  = 3     # true within-subject SD, Group A
sB  = 10    # true within-subject SD, Group B
muA = 200   # true population mean, Group A
muB = 400   # true population mean, Group B

# generate dataset:
np.random.seed(0)
yA  = []
yB  = []
for i in range(N):
    yA.append( muA + sA * np.random.randn(n) )
    yB.append( muB + sB * np.random.randn(n) )
yA  = np.asarray( yA, dtype=int )
yB  = np.asarray( yB, dtype=int )

# conduct eqality of variance test:
mA  = yA.mean(axis=1)  # within-subject means, Group A
mB  = yB.mean(axis=1)  # within-subject means, Group B
res = stats.levene(mA, mB)
print('Equality of variance test:  p = %.3f' %res.pvalue)

# conduct two-sample test on variances:
sA  = yA.std(axis=1, ddof=1)    # within-subject SDs, Group A
sB  = yB.std(axis=1, ddof=1)    # within-subject SDs, Group B
res = stats.ttest_ind(sA, sB)
print('Two-sample test, WS SDs:    p = %.3f' %res.pvalue)

Results:

Equality of variance test:  p = 0.152
Two-sample test, WS SDs:    p = 0.001