0todd0000 / spm1d

One-Dimensional Statistical Parametric Mapping in Python
GNU General Public License v3.0
61 stars 21 forks source link

Multiple regression #160

Closed Little-Foot-Shapes closed 3 years ago

Little-Foot-Shapes commented 3 years ago

Hello Todd,

I was wondering if there is a way to perform multiple regression in the non-parametric regression in SPM1D (Flattened 3D data versus multiple 1D data such as age, foot length or BMI)? Thanks

Mat

0todd0000 commented 3 years ago

Hi Mat,

Multiple regression is possible in general with SPM but no direct implementation exists in spm1d.

However, if the dependent variable is univariate, then I believe that this can be done rather easily with spm1d.stats.glm, which also supports ANCOVA, and where you can include as many regressors as you like.

Todd

Little-Foot-Shapes commented 3 years ago

Hello Todd,

Thank you for your reply. I have done this using 4 2D measures of the foot (such as mid-foot height etc.) as independent variables (148x4) and the curvedness of the whole foot as dependent variable (148x22498). I have projected this back onto the 3D foot surface.

There are quite a lot significant clusters, however I am not sure how I can interpret it? Do all 4 independent variables predict the changes in the dependent variable in the significant areas? Any suggestions?

If I did a traditional multiple regression I would have separate stats for each independent variable, but here I only get one?

Also, is there a way to check collinearity like its done for traditional multiple regression?

Finally, does the data have to comply with any assumptions?

Thank you.

Best Regards

Mat

Little-Foot-Shapes commented 3 years ago

Hello again,

So I had a look at this issue: https://github.com/0todd0000/spm1dmatlab/issues/39 . My aim is to find independent variables that best predict the dependent variable. I have a few questions:

Thank you.

Best Regards Mat

0todd0000 commented 3 years ago

How do I remove the nuisance factors?

If you mean "remove effects of nuisance factors", then simply assign zeros to the relevant independent variables (IVs) in your contrast vector.

What does the intercept do and is it always 1?

In simple regression: y = mx + b, the intercept is "b". It must be modeled as 1 for all observations because it is a constant value. It effectively removes the grand mean.

So when I interpret the results I can say how much the empirically important factor (marked by 1 in the contrast) correlates with the dependent variable after accounting for all the other variables included in the model (marked by 0 in the contrast)?

Yes. However, interpretation problems arise when two or more IVs are correlated. If two arbitrary IVs x1 and x2 have exactly the same values, the problem can't be solved because unique coefficients ("m" in the simple regression model above) can not be assigned to each IV. When x1 and x2 are strongly correlated a similar problem exists. Only when x1 and x2 are not correlated can their effects be clearly separated.

Is there a limit to the number of variables in the model?

No, mathematically. However, in practice they must be limited because multiple IVs are generally correlated. The more correlated IVs that are included in the model, the poorer your ability to attribute specific effects to specific IVs.

If I suspect a negative relationship between the independent and dependent variable should I put -1 in the contrast?

Yes, for a one-tailed test. For a two-tailed test the sign does not matter. Exception: if your contrast vector has non-zero values for multiple IVs then the sign(s) may indeed matter.

And in general does this statistics give you a directional result (negative\positive correlation)?

Yes, t contrasts (i.e., vector contrasts) produce t values, and these can be positive or negative, indicating the direction of the effect.

Little-Foot-Shapes commented 3 years ago

HI Todd,

Thank you for your reply.

I have run some multiple regression using the glm method. In one I have included 4 IVs and the intercept and ran it 4 times, assigning 1 to a different IV in the contrast each time. I got interesting result, but the t-threshold (zstar) was always the same? Is that supposed to be like that? The clusters were in different locations on the foot and the highest correlation (Pearson) between any IVs was 0.546 (p=0.000).

In another issue regarding glm (although using Python: Alternative(s) to multiple linear regression #81) the IVs in the contrast were after the intercept. Does that matter?

Could I have IVs before the intercept and e.g. Age and BMI as a nuisance factor after the intercept?

Are there any assumption for this test?

Also, do you have any references that used similar approach in maybe kinematics?

Thank you.

Mat

0todd0000 commented 3 years ago

Hi Mat,

the t-threshold (zstar) was always the same?

Yes. If the model (i.e., design matrix) is the same, then both the degrees of freedom and the residuals are also the same. Thus the critical threshold is also the same.

I have included 4 IVs and the intercept and ran it 4 times, assigning 1 to a different IV in the contrast each time.

This is OK, but this is post hoc testing. The GLM function in spm1d is meant to test a single t contrast. In other words, each contrast vector represents a single null hypothesis, and testing multiple hypotheses is an ANOVA-type analysis. The robust way of dealing with multiple hypotheses is to use a contrast matrix, which embodies all hypotheses to be tested, in which case the resulting test statistic relates to the family of tests, and not to a specific hypothesis, very similar to ANOVA. Unfortunately spm1d.stats.glm does not yet support contrast matrices. For more details please see this multiple comparisons document; this document uses the multcomp package for R, but the overall idea is nearly identical to this discussion.

...the IVs in the contrast were after the intercept. Does that matter?

No, it should not matter where the intercept is positioned. Just note that, if the columns of the design matrix are shuffled, then the contrast vector needs to be adjusted to reflect the new column ordering.

Could I have IVs before the intercept and e.g. Age and BMI as a nuisance factor after the intercept?

Yes, this should be fine. From a model fitting perspective, the intercept and nuisance factors are also IVs, and the IV order does not matter. What makes these "nuisance factors" is simply their exclusion from contrast vectors.

Are there any assumption for this test?

Yes:

Also, do you have any references that used similar approach in maybe kinematics?

The only paper I know of that uses GLM analysis in spm1d is Knechtle et al. (2020). Perhaps a better reference is Karl Friston's seminal 1994 paper which describes the GLM approach that most neuroimaging papers follow.

Friston KJ, Holmes AP, Worsley KJ, Poline JP, Frith CD, Frackowiak RS (1994). Statistical parametric maps in functional imaging: a general linear approach. Human Brain Mapping 2(4):189-210.

Knechtle D, Schmid S, Suter M, Riner F, Moschini G, Senteler M, Schweinhardt P, Meier ML (2020). Fear avoidance beliefs limit lumbar spine flexion during object lifting in pain-free adults. medRxiv.

Little-Foot-Shapes commented 3 years ago

Hi Todd,

Thank you for your reply.

" The robust way of dealing with multiple hypotheses is to use a contrast matrix, which embodies all hypotheses to be tested, in which case the resulting test statistic relates to the family of tests, and not to a specific hypothesis, very similar to ANOVA. " This means that ideally a contrast vector with multiple 1s (one for each IV) should be used, but this is not supported yet? Based on your reply , do you have any suggestions to deal with this issue?

If I read it correctly in the fear paper, they did not do the ANOVA type, but went straight for the post-hocs? If I go with their approach, how do I correct for multiple tests? Simply using Bonferroni and setting the alpha in the GLM accordingly? 4 tests, 0.05/4?

Regarding the assumptions, are there any tests in MATLAB for these (for the dependent variable)? In the fear paper they used: spm1d.stats.normality.k2.ttest, but its in Python? Is there an equivalent of this is MATLAB?

Thank you.

Mat

0todd0000 commented 3 years ago

This means that ideally a contrast vector with multiple 1s (one for each IV) should be used, but this is not supported yet?

Close. A contrast matrix has multiple rows, one per hypothesis. Each row (hypothesis) is a contrast. spm1d.stats.glm does not support multiple-row contrasts like these.

Based on your reply , do you have any suggestions to deal with this issue?

Use at least a correction for multiple comparisons across the multiple hypotheses. If you are interested in only simple (multiple) regression then a Bonferroni correction would likely be sufficient. More generally, note that contrast vectors represent only main effects, and that for arbitrary design matrices interaction effects are also probably of interest. So this simple Bonferroni-approach across multiple contrast vectors should not be applied beyond the case of multiple regression.

If I read it correctly in the fear paper, they did not do the ANOVA type, but went straight for the post-hocs? If I go with their approach, how do I correct for multiple tests? Simply using Bonferroni and setting the alpha in the GLM accordingly? 4 tests, 0.05/4?

Yes, I think this would be acceptable to reviewers.

Regarding the assumptions, are there any tests in MATLAB for these (for the dependent variable)? In the fear paper they used: spm1d.stats.normality.k2.ttest, but its in Python? Is there an equivalent of this is MATLAB?

Yes, spm1d.stats.normality.ttest. Please find all normality tests in ./spm1dmatlab/+spm1d/+stats/+normality/

Little-Foot-Shapes commented 3 years ago

Hi Todd,

Thank you for your reply. So it looks like my data does not comply with assumptions and also have the same problem that I described in https://github.com/0todd0000/spm1d/issues/153 regarding normality test for regression. I have cleaned the data, so there are no outliers, but still get the same error.

So I am going to have to revert back to simple non-parametric regression, even though that way I cannot control for other IVs or nuisance factors?

Thanks Mat

0todd0000 commented 3 years ago

Sorry for the delay! Yes, simple regression does not control for nuisance factors. It is possible to implement nuisance factors in nonparametric analysis, but it is just not supported in the current version of spm1d.

Little-Foot-Shapes commented 3 years ago

Thank you!