Non-parametric inference for arbitrary ANOVA designs in python

PeterLawrenson commented 5 years ago

Hi Todd,

I am interested in performing planned orthogonal contrasts for analysis of variance between a group of factors with three fixed levels.

I have a two groups, group 1 is my control population and group 2 is my symptomatic group. Group two can be further split into two conditions. I am interested in looking at specific comparisons between group 1 and 2, as well as a comparison between the two conditions that make up group 2. The data is not normally distributed so i have been looking into how planned contrasts might be done using SnPM, specifically how to incorporate contrast co-efficients, but have not had much luck.

So my question is whether performing planned orthogonal contrasts within SnPM was possible within the current version of spm1d and if so how this might be implemented

Thanks in advance for any help you might be able to provide with this.

Kind Regards

Peter

0todd0000 commented 5 years ago

Hi Peter, apologies for the delay.

spm1d does not support arbitrary ANOVA models directly, but you could use a third party package like statsmodels as a work-around. Appropriate use of statsmodels requires a consideration of sphericity...

CASE 1: Sphericity assumed

If equal variance is assumed across all condition / group combinations, then there is no problem, and arbitrary statsmodels approaches can be used, like this for one-way ANOVA:

import numpy as np
from matplotlib import pyplot
import spm1d

import statsmodels.api as sm
from statsmodels.formula.api import ols
import pandas

#(0) Load data:
dataset      = spm1d.data.uv1d.anova1.Weather()
Y,A          = dataset.get_data()

#(1) Compute F statistic using spm1d (assuming equal variance):
alpha        = 0.05
spm          = spm1d.stats.anova1(Y, A, equal_var=True)
fstat_spm1d  = spm.z

#(2) Compute F statistic using statsmodels:
nFrames      = Y.shape[1]
F            = np.empty(nFrames)
for i in range(nFrames):
    d        = dict(y=Y[:,i], A=A)
    dframe   = pandas.DataFrame( d )
    lm       = ols('y ~ C(A)', data=dframe).fit()
    table    = sm.stats.anova_lm(lm, typ=2)
    F[i]     = table.F[0]

#(3) Plot results:
pyplot.close('all')
ax = pyplot.axes()
ax.plot(fstat_spm1d, lw=6, label='sp1d')
ax.plot(F, lw=2, label='statsmodels')
ax.legend()
pyplot.xlabel('Time (days)', size=20)
pyplot.ylabel('F value', size=20)
pyplot.show()

anova1

and for two-way ANOVA like this:

#(0) Load data:
dataset      = spm1d.data.uv1d.anova2.SPM1D_ANOVA2_3x5()
Y,A,B        = dataset.get_data()

#(1) Compute F values using spm1d:
FF           = spm1d.stats.anova2(Y, A, B, equal_var=True)
fstats_spm1d = [F.z for F in FF]

#(2) Compute F values using statsmodels:
nFrames       = Y.shape[1]
FA,FB,FAB     = np.empty(nFrames), np.empty(nFrames), np.empty(nFrames)
for i in range(nFrames):
    d         = dict(y=Y[:,i], A=A, B=B)
    dframe    = pandas.DataFrame( d )
    lm        = ols('y ~ C(A) * C(B,Sum)', data=dframe).fit()
    table     = sm.stats.anova_lm(lm, typ=2)
    FA[i]     = table.F[0]
    FB[i]     = table.F[1]
    FAB[i]    = table.F[2]

#(3) Plot results:
pyplot.close('all')
pyplot.figure(figsize=(12,4))
ax0  = pyplot.subplot(131)
ax1  = pyplot.subplot(132)
ax2  = pyplot.subplot(133)
ax0.plot(fstats_spm1d[0], lw=6, label='spm1d')
ax1.plot(fstats_spm1d[1], lw=6)
ax2.plot(fstats_spm1d[2], lw=6)
ax0.plot(FA, lw=2, label='statsmodels')
ax1.plot(FB, lw=2)
ax2.plot(FAB, lw=2)
ax0.set_title('Main A', size=16)
ax1.set_title('Main B', size=16)
ax2.set_title('Interaction AB', size=16)
ax0.legend()
pyplot.show()

anova2

Then you can do things that are not yet supported in spm1d like remove the modeled interaction. This can be done easily in statsmodels by changing the model from y ~ C(A,Sum) * C(B,Sum) to y ~ C(A,Sum) + C(B,Sum) , like this:

lm        = ols('y ~ C(A) + C(B,Sum)', data=dframe).fit()

This removes the interaction effect from the statsmodels results as indicated in the figure below.

anova2_noAB

CASE 2: Sphericity not assumed

Problems occur when sphericity is not assumed, and there are two main reasons for this:

There are a variety of methods for dealing with non-sphericity for 0D data, but not all can be applied directly to 1D data.
Most non-sphericity corrections involve a degrees-of-freedom (df) correction. This is no problem for 0D data, when there is only one df value per contrast, but when using statsmodels with 1D data there will be Q different df values, where Q is the number of 1D continuum nodes. Computing a single threshold for 1D inference requires a single df value. In the SPM literature this is usually done using a maximum likelihood estimation procedure, but this procedure is not easy to implement (and this is the reason that more ANOVA options are currently not supported in spm1d).

To summarize, feel free to use statsmodels with arbitrary ANOVA models to compute 1D F statistics, but only if it is justifiable to assume equal variance. If equal variance is not assumed, things become more complex and third party packages like statsmodels may not work correctly when applied to 1D data. Using non-parametric inference in spm1d can partially alleviate the df problem, but I'm uncertain whether conducting non-parametric inference in this manner (with changing df) is valid.

PeterLawrenson commented 5 years ago

Hi Todd,

Thanks for taking the time to provide such a detailed response, its great to know there is a possible work around within spm1d. Its given me a few more things to think about as far as my analysis, so I will have a go applying your recommendations to my data set to see what the outcomes are.

Thanks again Todd

Cheers

Peter

0todd0000 / spm1d

Non-parametric inference for arbitrary ANOVA designs in python #94