Multiple regression with GLM

aaronsfox commented 1 year ago

Hi @0todd0000

I'm trying to run a multiple-regression type analysis with the GLM analyses, and attempting to follow the example on the SPM1D website. For my problem I have a series of strains from a 3D surface as the independent variable (i.e. Y) and a set of continuous dependent variables (i.e. X) I wish to regress against the strain data. I understand the overall output of the test statistic that the SPM procedure generates (i.e. the spmi.z as the test statistic), but am a little confused as to how the inputs are structured (i.e. X and c) and then how to interpret the individual contributions of the dependent variables.

The example on the webpage provides one dependent variable of empirical interest, alongside an intercept, linear and sinusoidal drift. I'm wondering if these intercept and 'noise' signals are necessary for the GLM, and how you might approach these and the contrast vector in a multiple regression type problem that I've proposed?

0todd0000 commented 1 year ago

X is a design matrix. Its rows correspond to observations and its columns correspond to experimental factors (for continuous variables) or factor levels (for categorical variables), where each could be nuisance factors / covariates. It is meant to be a complete representation of an experiment, where each row contains all experimentally relevant information for each observation. However, it does not specify what columns (i.e., factors) are of experimental interest, it is just a model of the experiment. This is where c comes in.

c is a contrast. It represents the null hypothesis as a linear combination of columns. In spm1d currently only contrast vectors are directly supported. The number of elements in the vector must be the same as the number of columns of X.

The noise models (linear and sinusoidal drift) in the spm1d website example are not necessary to include your design matrix. In the example they are meant to be just examples of arbitrary covariance / nuisance factor modeling.

An intercept is not strictly necessary either, but excluding it may force regression lines to include the origin (depending on the model).

Here is a nice example of using design matrices for multiple regression. The same approach can be applied to 1D data (or 3D surface); the dimensionality of the observations is irrelevant to design matrix and contrast vector modeling.

aaronsfox commented 1 year ago

@0todd0000 - thanks for clearing these details up. One additional question I have here is with respect to the glm test seemingly not having a non-parametric equivalent. The 2D data example highlights the need to use the non-parametric tests — so is the parametric glm test therefore not applicable in that same context?

0todd0000 commented 1 year ago

Apologies for the delay! I missed this in my inbox. A non-parametric version of spm1d.stats.glm is not available simply because it is difficult to automate permutation. Observation labels for simpler designs like t tests are easy to permute, but appropriate permutation procedures become more difficult to implement as design complexity grows. Since spm1d.stats.glm supports arbitrarily complex designs it is a difficult to create a non-parametric version.

aaronsfox commented 1 year ago

Thanks for the follow-up @0todd0000. I had assumed what you've mentioned here after digging through the source code and not being able to find anything. we had to revert to a linear regression approach for these data.

I'm happy for you to close this issue if you'd like

0todd0000 / spm1d

Multiple regression with GLM #241