0todd0000 / spm1d

One-Dimensional Statistical Parametric Mapping in Python
GNU General Public License v3.0
61 stars 21 forks source link

Reporting regression coefficients - GLM #100

Closed JohnDOConnor closed 5 years ago

JohnDOConnor commented 5 years ago

Hi Todd,

Thank you very much for making code available for SPM. Enjoying using it!

I am exploring 1-dimensional continuous trajectories of blood pressure after standing from lying down rather than a biomechanical measure. The hypothesis is that standing speed will influence the BP measures. As we're a clinical study, I have to control for a lot of covariates and am using the GLM model with continuous BP as the dependent variable and 12 covariates (some of which are categorical and some continuous) including standing speed (which is the contrast).

I am getting sensible results, however, I would like to make them a bit more interpretable to the clinical audience by reporting regression coefficients for standing speed (I think i'm correct in saying these are in the spmi.beta variable in MATLAB?). Do you think it makes sense to report the peak beta value or averaged beta value over a range which is significant?

Thanks very much John

0todd0000 commented 5 years ago

Sorry for the delay!

The beta variables are the regressors, and not regression coefficients. For categorical variables the betas are means, and for continuous variables they are regression slopes. spm1d currently computes regression coefficients only for simple linear regression (spm1d.stats.regress), and these coefficients can be found in: spm.r.

The usual way to transform between the correlation coefficient (r) and t statistic is:

t = r * sqrt( (n - 2) / (1 - r**2) )

where n is the sample size. I believe that this transformation is also applicable to regression involving covariates, but I'm not sure. If just presenting the results to an audience, this transformation should be fine, but I'd recommend against publishing the results unless you can confirm that this transformation adheres to known results.

I wonder if this t statistic itself would be suitable? I personally find t easier to understand than r, partially because it spans a more intuitive range (-infinity to +infinity and not -1 to +1), but also because it more closely represents the ratio between effect and variance. I wonder if an example of simple linear regression could help the audience understand? Your t value results will have the same meaning as the t value results from simple linear regression.

Please note: spm1d's GLM function is currently used mainly as the common engine for other linear models, and has not been tested thoroughly for arbitrary use. I've only tested this function for continuous covariates, and it should be OK for an arbitrary number of continuous covariates, but I'm not sure if it is also suitable for categorical covariates. One concern is that categorical variables need to be implemented in a binary sense: one column per within-variable category, where each categorical column contains only zeros and ones. Include categorical variables in single columns will cause them to be interpreted as continuous. So I'd recommend comparing spm1d's GLM results to known results from the internet involving categorical covariates.

Todd

JohnDOConnor commented 5 years ago

Thank you very much for the reponse Todd

Will have to think about this! I may be able to introduce the t-statistic before presenting specific results

Thanks again John