0todd0000 / spm1dmatlab

One-Dimensional Statistical Parametric Mapping in Matlab.
GNU General Public License v3.0
27 stars 13 forks source link

General linear model #20

Closed SinaDavid closed 8 years ago

SinaDavid commented 8 years ago

Hey all, I want to use the general linear model function and have multiple independent variables. When I checked the example which is attached to the glm.m there is only one independent variable. I modified x to have 6 variables and tried to change to code, but got errors.

From the example I do not understand what nFactors is, as there are only 3 walking speeds... It would be really helpful if you could give me a short explanation to these code lines nFactors = 4; X = zeros(nCurves, nFactors); X(:,1) = x; X(:,2) = 1; X(:,3) = linspace(0, 1, nCurves); X(:,4) = sin( linspace(0, pi, nCurves) );

Do I have to create a 3 dimensional matrix, one of the X matrix for each independent variable? Kind regards,

Sina

0todd0000 commented 8 years ago

Hi Sina,

Thanks for your question. Roughly speaking nFactors is the number of experimental factors. More precisely it is it the number of continuous experimental factors plus the total number of levels of categorical factors. This might be easier to understand by example:

  1. Two-sample t test. Imagine five observations for each of two groups. The independent variable is GROUP and there are two categorical levels of GROUP: Group1 and Group2. The design matrix is indicated below. There are two columns, one for each categorical level. untitled 2
  2. Simple linear regression. Imagine that five values of the independent variable (e.g. body mass) are: 70, 65, 80, 79, and 73, respectively. The design matrix is indicated below. There are two columns, one for the continuous independent variable (body mass) and one for the intercept. untitled
  3. Cited example. There are four columns in the design matrix. The first two represent simple linear regression just like example 2 above. The third and fourth columns represent continuous nuisance factors: a linear one and a sinusoidal one.

Regarding the error you mention, if the discussion above doesn't solve the problem, then please send some more details about the error. The following would be good:

Cheers,

Todd

SinaDavid commented 8 years ago

Hey Todd! Thanks for this quick answer. With your advise I made it work. Maybe you can just help me with the interpretation. My dataset is like this: independent parameters for 17 subjects: one that is categorical (0 and 1 = 2 levels) and one that is continuous, I'm not sure how to answer the question about having a nuisance variable, but it sounds important... My dependent variable is a set of forces normalized to 101 data points for 17 subjects.

I changed the code into (please correct me if it is nonsense):

nCurves = numel(x(:,1)); nFactors = 5; X = zeros(nCurves, nFactors); X(:,1) = x(:,1); %categorial variable X(:,2) = x(:,2); %contuous variable X(:,3) = 1; % This just tells the model that there is a linear correlation, right? X(:,4) = linspace(0, 1, nCurves); X(:,5) = sin( linspace(0, pi, nCurves) ); % specify contrast vector: c = [1 1 0 0 0]'; % taking into account the first and second variable


I attached the results figure If I'm right, I will have to do some sort oft post hoc testing. Is the way I modified the code correct?

glm.pdf

Thank you!

0todd0000 commented 8 years ago

Hi Sina,

A nuisance factor is a factor that might affect your dependent variable(s) but which is not of explicit empirical interest. A common example is linear drift: electronic sensor measurements can sometimes drift over time. These factors can be included in the model, but effects associated with them are not tested directly.

Without knowing the content of x is is difficult to know whether these two lines correctly implement categorical and continuous factors:

X(:,1) = x(:,1); %categorial variable
X(:,2) = x(:,2); %contuous variable

Regarding the line:

X(:,3) = 1; % This just tells the model that there is a linear correlation, right?

That is actually just an intercept. It can be included in the model but may be redundant if you use one or more categorical variables.

As above, without knowing the content of x it is difficult to know whether or not the contrast vector is correct:

c = [1 1 0 0 0]'; % taking into account the first and second variable

If the second column of X is a continuous variable then you probably want the following contrast vector:

c = [0 1 0 0 0]';

Please look at the code for all t related tests including:

All of these tests use spm1d.stats.glm and may clarify how to set up the design matrix and contrast vector. You may also be interested in reading some reference materials regarding linear modeling. Two great starting places are Friston et al. (1995) and the SPM document repositiory:

Note especially: if spm1d.stats.glm does not generate errors, it does not mean that the results are correct. Please use this function with caution.

Cheers,

Todd

SinaDavid commented 8 years ago

Hey Todd, I compared my results with a simple linear regression for each of my parameters, which made me quite confident that the glm result may make sense. But I will go through the papers you suggested and come back if there is more questions!

Thank you for your detailed answers!

Sina

tnsavage commented 4 years ago

Hi Todd,

I have been trying to run a glm comparing gait data between 2 independent groups (Case/control) using gait speed as a covariate. I have tried to follow your example code and your post to Sina here.

After running this and checking the results, it struck me that I wasn't putting group into the model. I've tried a few things to correct this but what seemed logical to me didn't change my results or it did but I'm not confident that I was doing the right thing and I wanted to check.

This is my definition of the design matrix (matlab):

X                = zeros(nCurves, nFactors);
X(:,1)         = x;  %regresor (gait speed)
X(1:nA,2)  = 1; %group 1
X(nA+1:end,3) = 1; %group 2
X(:,4)         = linspace(0, 1, nCurves); % linear noise
X(:,5)         = sin( linspace(0, pi, nCurves)  ); 

Your notes to Sina and on your website say please use stats.glm with caution. Is what I have above for the design matrix correct and can you expand on what you mean by 'caution'? is that another way of saying you need to know what you're doing or are there checks that we should run on the results beyond what we might do for a ANOVA on discrete data?

Thanks

Trevor

0todd0000 commented 4 years ago

The design matrix looks fine, but may need to be tweaked a bit to more closely represent the experiment. Consider the following.

So I'd suggest:

X             = zeros(nCurves, nFactors);
X(:,1)        = x; %regresor (gait speed)
X(1:nA,2)     = 1; %group 1
X(nA+1:end,3) = 1; %group 2

c             = [0 -1 1];   % difference between group means

Yes, your interpretation of "caution" is correct. The glm function is highly flexible, so users must be confident that their model(s) and contrast(s) are correct when using this function. spm1d doesn't offer any tools to check the implementation, so you might want to compare spm1d's glm results to those from a third party package (e.g. R, SPSS) to ensure that the model has been implemented correctly.