Do GLM routines account for temporal autocorrelation?

nicholst commented 3 years ago

I colleague asked me about the linear modelling tools in CONN, and I was trying to figure out if CONN's GLM tools account for temporal autocorrelation in fMRI time series. I studied https://github.com/alfnie/conn/blob/master/conn_glm.m & https://github.com/alfnie/conn/blob/master/conn_glmunivariate.m but as far I can tell these GLMs are always estimated with OLS (and not GLS with whitening for time series autocorrelation).

If this is correct, isn't there a problem with the stderr's, T-values and P-values when these models are fit to first-level fMRI time series data?

alfnie commented 3 years ago

Your interpretation is exactly right, both the 1st-level and 2nd-level GLMs in CONN use OLS for estimation and do not explicitly account for temporal autocorrelation. That said, CONN stats (supported by the functions conn_glmunivariate and conn_glm) are only used/offered in the context of group-level analyses where they should be largely unaffected by the presence of temporal autocorrelation in fMRI timeseries. Briefly, CONN's 1st-level models are used (separately for each subject) to estimate, using OLS, measures of functional connectivity strength (e.g. correlation coefficients between the timeseries at two areas). Those measures are then Fisher-transformed (atanh transformation only, not using a dof scaling factor) and entered into a 2nd-level GLM analysis, which (also using OLS for estimation) will compute group-level stderr/T/p values using the methods described in https://www.conn-toolbox.org/fmri-methods/general-linear-model (for voxel- or connection- level statistics) and https://www.conn-toolbox.org/fmri-methods/cluster-level-inferences (for cluster- or network- level statistics). I wold expect the 2nd-level model sample variance to incorporate both between- and within- (e.g. measurement error, affected by temporal autocorrelation) sources of variability, so that inferences would remain valid (even if arguably conservative) in the presence of arbitrary levels of temporal autocorrelation in the original timeseries. Does this sound correct? I would love to hear your thoughts about any/all of this!

nicholst commented 3 years ago

Those measures are then Fisher-transformed (atanh transformation only, not using a dof scaling factor) and entered into a 2nd-level GLM analysis

This is the key detail I was looking for.

The issue I was asked about was taking output from CONN and feeding it into FSL's multi-level modelling. But by not doing a dof scaling you're acknowledging you don't have Stderrs, which is fine, but means my colleague can't use the standard 'FEAT' method that uses the 'cope' and 'varcope' pairs.

Thanks for the quick reply!

nicholst commented 3 years ago

BTW, if you did want to get standard errors for correlations please do checkout my "xDF" work with @asoroosh https://pubmed.ncbi.nlm.nih.gov/31158478/ https://github.com/NISOx-BDI/xDF ... while getting accurate effective DF requires estimates of the autocorrelation in each time series separately and the lagged cross-correlations, we've got Matlab code in the repo that does all the work.

alfnie commented 3 years ago

Excellent, and for additional info you may always refer your colleague to https://www.conn-toolbox.org/fmri-methods/connectivity-measures which include the equations of the different 1st-level connectivity measures computed by CONN. And thanks for the xDF reference and manuscript! that is indeed very interesting as it opens the possibility of extending CONN to handle single-subject analyses, which has been in my to-do list for a long time. One quick question: in the xDF README it seems to suggest that estimated correlation coefficients themselves are often under-estimated in the presence of temporal autocorrelation (showing an example with simulated data with an expected r=0.40 but measured r=0.19), but I do not seem to find an equivalent claim in the paper, and the claim itself sounds somewhat counterintuitive to me unless one expects to have frequency-varying levels of functional connectivity strength or some other additional non-conventional assumption like that. Could you please clarify whether the expected biases introduced by temporal autocorrelation in BOLD timeseries (not biases introduced by possible different lags, but just by the presence of a non-white autocorrelation structure itself) are limited to biases in the estimated stderr and effective-DF values associated with a correlation coefficient, or whether we can expect those biases to also show up in the estimated r values as well? Thanks again!

asoroosh commented 3 years ago

Thanks @alfnie! Thanks @nicholst! On page 620 of the manuscript, it is throughly discussed that the correlation coefficients are indeed approximately unbiased. You are absolutely right, when two time series are serially correlated, it is the variance and nominal DF of sample correlation that is biased (i.e. underestimated). The last section of the README file is about technicalities/difficulties of accurately simulating time series by drawing samples from covariance and autocovariance. I will clarify this on the README page. Thanks again!

nicholst commented 3 years ago

Just to echo @asoroosh, Pearson's sample correlation is fortunately approximately unbiased under serial autocorrelation, and it is just the standard errors / effective DF that is messed up. (Just like OLS GLM, actualy, in that coefficients are unbiased under dependent errors but estimated variances & inferences are biased).

alfnie commented 3 years ago

Excellent, happy to hear! also thanks @asoroosh for the clarification, on a quick look it appears those biases may perhaps be caused just by the Sims/corrautocorr.m function which appears to be missing the S3.1 (S9) correction factor (I imagine this may have been on purpose perhaps just to avoid numerical issues with that correction or just to simplify the simulations?), in any way the code looks really great, I enjoyed the read, congratulations, and thanks again!

asoroosh commented 3 years ago

Thanks @alfnie. I just updated the README file with more clarifications + more details on S9. I intended to keep the corrections out of the corrautocorr.m as it is not always required (only when covariance is high and the autocovariance is different + the user intend to keep both accurate) but yet it is computationally heavy (two more matrix decomposition). I also added some comments to the corrautocorr.m usage so if someone would like to use them, they can easily use them. Anyways, I hope the new bit added to README makes these more clear. Thank you again for the suggestions and comments.

alfnie / conn

Do GLM routines account for temporal autocorrelation? #2