Mensen / ept_TFCE-matlab

Advanced EEG Statistics
27 stars 9 forks source link

Mixed repeated measures Anova with 1 between subject factor and 2 within subject factors #6

Open apoublan opened 6 years ago

apoublan commented 6 years ago

Hi Armand,

Would you have an experimental script doing that ? because I don't think it is possible with the current version of the toolbox. Ideally I would even need an unbalanced design anova (different number of subjects for the between subject factor, group assignment) but I think this is clearly impossible, so I was thinking about randomly picking the same number of subjects in each group. Thank you,

Arnaud

Mensen commented 6 years ago

Hey Arnaud... I do indeed. The scripts are also within the toolbox, but not well advertised just yet as the whole methodology should still be extensively validated with simulations etc.

From within the toolbox the relevant script is https://github.com/Mensen/ept_TFCE-matlab/blob/master/TFCE/Tools/ept_slow_lme_permutation.m

An example section of script that calls this function would be:

factors = struct(...
    'name', {'between_predictor1'}, ... % predictor name to permute
    'flag_within', {0}, ... ; % which predictors in that list are within-subject
    'num_perm', 500);
channel_neighbours = ept_ChN2(e_loc);
% run the model for all channels
[observed, perm_values] = ept_slow_lme_permutation(...
    data_of_interest, ...
    full_table, ...
    model_description, ...
    factors, ...
    channel_neighbours);

The initial setup of your data (in the example full_table) can be a little bit of the tricky part depending on how your data is organised now. Essentially this is a matlab table variable with all your relevant data which must have a participant_id column.

The model description follows from what would be standard formatting for matlab linear mixed models which would be something like:

model_description = ['dependent_variable ~ ', ...
    'between_predictor1 * within_predictor1 * within_predictor2', ...
    '+ participant_age + behavioural_scores', ...
    '+ (1 | participant_id)'];
full_model = fitlme(full_table, model_description);

And lastly the data of interest would be the entire list of data from each channel such that full_table.dependent_variable is just a single channel example from this.

I can send you a couple example datasets that would definitely run with this format and produce good results if you run into any trouble with formatting. The toolbox is easy to run for just a single or two factors, but unfortunately the whole theory of permutations and linear mixed modelling just becomes a complicated topic once you extend past single factors.

Lastly, some plotting of the factors can be done using something like:

for factor_of_interest = 1 : length(full_model.CoefficientNames)
    fprintf('examining %s\n', full_model.CoefficientNames{factor_of_interest});
    csc_Topoplot(...
        observed.t_value(factor_of_interest, :), ...
        e_loc, ...
        'PlotChannels', 1, ...
        'MarkedChannels', observed.tfce_pvalue(factor_of_interest, :) < 0.00141, ...
        'MarkedString', '*', ...
        'MarkedColor', [1, 1, 1]);
    colorbar
    export_fig(gcf, ['nde_topo_', num2str(factor_of_interest)], '-jpg', '-m3');
end
kwolfert commented 5 years ago

Hi Armand,

I'm thinking about doing a similar analysis with the lme script (1 between-predictor and 1 within-predictor), but I'm having trouble formatting my data. I currently have a 3D matrix (subject x channel x time). When I choose a single channel for full_table.dependent_variable, should this be the data from a single time point, or do all of the time points get stacked into this one column? Essentially, where does the time dimension belong?

Thanks, Katie

Mensen commented 5 years ago

Hi Katie,

Unfortunately the lme approach is quite tricky and sort of hacked together for my own purposes.

I used the approach to examine just a single topography of a frequency band, so no additional time points. This approach with 2000 permutations took something like 20 hours to run through since there is no (current) way of vectoring the mixed model analysis over multiple channels and time points simultaneously. This seems unlikely to be solved all that quickly considering the lme approach works through iteration and since this mass data approach is not that common it doesn't seem many people are motivated to find a quick solution.

You are right that full_table.dependent_variable would initially be a single channel at a single time point, and this is then looped over all the data to get the initial set of observed statistics (for each of the estimated terms in the mixed model). TFCE is then run over this set of statistics, and then again for each of the randomised datasets.

You could probably get away with using fewer permutations and still get a good estimate but with any reasonable number of channels and time points you might still be looking at a very very long calculation time.

Why are you hoping to use the linear mixed models for this? Does your data not fit in with the more usual 2-factor mixed ANOVA design that is also available in the toolbox?

kwolfert commented 5 years ago

Thank you for that detailed explanation, I didn't realize it would take so long to run! Now I understand why the script is set up to look at a single time point.

I was interested in linear mixed models because I have an uneven number of participants in my groups (20 in one group, and 11 in the other). To my understanding, that wouldn't work for the normal 2-factor mixed ANOVA...

Mensen commented 5 years ago

Indeed... the current scripts are only optimised for even groups of participants... very unfortunately.

Having a mixed design also complicates the stats. I have great scripts for multi-factor, unbalanced designs that are either all between or all within-subjects... but the same sorts of tricks that make this process feasible (ie fast enough) for permutation analysis doesn't work for mixed designs. There's something special about how you take out the within estimates before estimating the coefficients for the between factors that I haven't figured out yet unfortunately.

I wish I had more time/resources to work on this problem as I'm also encountering more and more situations where a linear mixed model would be ideal (hence why I started with the mixed model approach).

Currently, for 128 channels it takes 53 seconds to run just the observed statistics for a single time point (a pretty complex model though with 100 terms / 19 seconds for a model with only 6 terms which I suppose will be similar to yours). So basically multiple that by however many time points you have and then that by how many permutations you want to run... starts getting pretty crazy.

For each linear mixed model it only takes about 150-300 ms to complete. So I think 99.9% of the statistics community already thinks that's perfectly fine and sees no need in optimizing this any further. But even this short time really add ups with multiple channels, time points, and 1000s of permutations. If you come across any tool that can estimate your base statistic either in parallel (all channels and time points simultaneously), or computes a single lme model in under 10 ms then let me know and I can integrate that calculation into the whole TFCE toolbox.

As a short-term solution I would try and analyse this with other tools... and then confirm the statistics of the results using the TFCE approach to guarantee validity and sensitivity of the result. Its not a great answer I know.

Mensen commented 5 years ago

So I was a bit annoyed at my answer and played around with some parallel loops... and the calculation time reduced by more than half...

So maybe running the whole thing is feasible with a decent computer, and depending on the number of channels and permutations etc...

Not much better but its a start...

kwolfert commented 5 years ago

Okay I will continue playing around with it and also look into other analysis tools. I will let you know if anything useful comes of it, or if I find a faster lme tool. Thank you for your help, I really appreciate it!