JoramSoch / MACS

MACS – a new SPM toolbox for model assessment, comparison and selection
GNU General Public License v3.0
23 stars 6 forks source link

cvBMS failure #2

Closed matanmazor closed 4 years ago

matanmazor commented 5 years ago

Thanks for this useful toolbox!

I'm trying to compare the fit of two models to my fMRI data. I have created an MS.mat file, which is a struct with fields:


     swd: 'D:\Documents\projects\inProgress\myProject\analyzed\modelSelection'
    SPMs: {35×2 cell}
    GLMs: {'GLM_quad'  'GLM_quad_cond'}

and where SPMs{i,j} is the path to the spm.mat of participant i in model j.

When I run 'perform BMS (automatic)' from the batch editor with this MS file I'm getting the following error message:

Failed 'MS: perform BMS (automatic)' Reference to non-existent field 'cvLME'. In file "D:\Documents\software\spm12\toolbox\MACS\batch_MS_BMS_group_auto.m" (???), function "run_module" at line 100.

So I tried to first run 'calculate cvLME maps (auto)' first, but then I'm getting:

-> Subject 1 (1 out of 35):
   - Model GLM_quad (1 out of 2) ... Failed  'MA: calculate cvLME (automatic)'
Error using vertcat
Dimensions of arrays being concatenated are not consistent.
In file "D:\Documents\software\spm12\toolbox\MACS\MA_cvLME_multi.m" (???), function "MA_cvLME_multi" at line 239.
In file "D:\Documents\software\spm12\toolbox\MACS\batch_MA_cvLME_auto.m" (???), function "run_module" at line 71.

The following modules did not run:
Failed: MA: calculate cvLME (automatic)

Thanks!

Remi-Gau commented 5 years ago

Just ran into the same issue...

I suspect this is because my sessions might not have the same number of regressors (e.g subject might not respond to stimuli, 'scrubbing' regressors added...)

Is that a possibility?

carolfs commented 4 years ago

I am getting the same error. Did anyone find a solution?

JoramSoch commented 4 years ago

WTF!? I'm getting issues and pull requests for my toolbox, why doesn't GitHub notify me?

Yes, this is indeed a common problem in the application of the toolbox. As @Remi-Gau correctly points out, it occurs when sessions don't have the same number of regressors (e.g. because subjects didn't make errors in one session). However, this is required due to the cross-validated nature of the procedure. Calculating the cvLME requires that each subset of the data (= session) provides information about each model parameter (= regressor).

The proposed solution for this is to define empty conditions (with no events), leading to empty regressors (with only zeros). In SPM model estimation, those regressors will then be marked as gray/"parameter not uniquely specified", because a regressor without variation cannot be attributed any variance in the signal. However, session-wise design matrices can be vertically concatenated (no error using "vertcat") and the cvLME can be calculated.

If anyone of you, @matanmazor @carolfs, tries the solution and it works, feel free to close this issue.

PS, note for statisticians and developers: In the very unlikely case that exactly S-1 sessions (1 less than the number of sessions) are missing this regressor, there is exactly 1 CV fold (the one with the remaining session as test set) in which all training design matrices have a zero regressor. In theory, this would cause the posterior precision matrix Lambda_n estimated from the training set to be non-invertible (see cvBMS paper, eq. 6b). This has been recognized very early (actually, before the toolbox was uploaded to GitHub) which is why the prior precision matrix Lambda_0 is not exactly a zero matrix (see cvBMS paper, eq. 15), but a diagonal matrix with a very small diagonal value (see "MA_cvLME_single.m", line 228 & "MA_cvLME_multi.m", line 232).

Remi-Gau commented 4 years ago

Cool. And also nice that there is a work around.

Will try it as soon as I get the chance.

Should this kind of trick be added to the wiki or in the readme of the repo?

carolfs commented 4 years ago

Rather than defining empty conditions, I added extra regressors with only zeros. Now I am calculating the cvLME score for my participants without getting any errors, so I would say the solution works.

JoramSoch commented 4 years ago

Should this kind of trick be added to the wiki or in the readme of the repo?

@Remi-Gau: I think I should add this to the manual somewhere.

JoramSoch commented 4 years ago

Rather than defining empty conditions, I added extra regressors with only zeros. Now I am calculating the cvLME score for my participants without getting any errors, so I would say the solution works.

@carolfs: Note that all regressors have to be in the same order in each session, in order for the cross-validation procedure to work. This means, when your missing condition is somewhere in the middle, the empty regressor also has to be at this position. Otherwise, regressors may not be concatenated properly, model parameters are being mixed up and the resulting cvLME is meaningless at best. So, if you are adding zeros at the right of the design matrix, but your missing condition was not the last one, this would not be a solution.

carolfs commented 4 years ago

I don't have missing conditions though, just a different number of scrubbing regressors.

Remi-Gau commented 4 years ago

Should this kind of trick be added to the wiki or in the readme of the repo?

@Remi-Gau: I think I should add this to the manual somewhere.

I for about the manual.

Let me know if you want me to send a PR to update it.

JoramSoch commented 4 years ago

@Remi-Gau: Thanks, I think I will do that tomorrow.

@carolfs: Scrubbing regressors, for cleaning out motion-related signals? I'm not familiar with this, but then, if you have a zero (scrubbing) regressor in one session, the assumption would be that the subject didn't move in that direction (or didn't rotate around that axis) in this session. In other words, the cross-validation applies to the motion regressors in the same way it applies to the onset regressors (so that they also have to be in the same order, if they have a particular meaning like the six canonical motion regressors). Note that motion regression is not done beforehand, but the whole design matrix enters the analysis (just like in the first-level GLM).

carolfs commented 4 years ago

@JoramSoch To me, it doesn't seem very useful that cross-validation applies to motion regressors. Participants don't move the same way every session, nor would we expect them to. For this analysis, it might make more sense to run motion correction beforehand and use the residuals for model comparison, without any motion regressors. What do you think?

JoramSoch commented 4 years ago

@carolfs: I see your point. We were considering this, but we have ultimately decided against this procedure, because there were a number of reasons that made it less appealing:

JoramSoch commented 4 years ago

I updated the manual. This is fixed by Commit 2588ee127fffd92d02a8b53573085199163679a3.

carolfs commented 4 years ago

@JoramSoch I agree with your first and third points, not sure I understand the second.

Anyway, I ran a regression model with only the motion regressors and took the residuals to the first-level analysis. I then ran the cross-validated Bayesian model selection procedure as described in the manual. The results were very different from what I got when I was retaining the motion regressors in the first-level analysis, even though these regressors were the same for the three models I am comparing. Maybe I made a mistake, and I should run it again to confirm. Would you expect the results to be different?

JoramSoch commented 4 years ago

Thanks for your response! This is very interesting. So you're saying, you get different model preferences (at the second level) when performing different motion corrections (at the first level), i.e. regressing out motion beforehand vs. having realignment parameters in the GLM. I would be very happy, if you could send more details (PPT with results, maybe first-level GLM and exemplary cvLME maps) to joram.soch@bccn-berlin.de.

On the one hand, your result is not surprising, because you are comparing the models on different data (motion-corrected vs. not motion-corrected) and it's clear that different data are best explained by different models. On the other hand, you're right that the motion parameters is not what's differing between the models. Could it be that there are systematic relationships between motion and task regressors (e.g. subjects moving in synchronization with the task)? Because otherwise, including vs. not including motion regressors should not have an influence on the selection of task regressors...