Unbalanced Mixed ANOVA - Githubissues

CorentinWicht commented 5 years ago

Dear Dr Mensen,

first of all many thanks for the amazing toolbox you programmed.

I was wondering whether there was any possibility to run a 2x2 mixed ANOVA with unbalanced sample sizes (while the within-subject factor will have same number of observations)?

Hence, Group 1, condition 1 = 31 Group 1, condition 2 = 31 Group 2, condition 1 = 33 Group 2, condition 2 = 33

I have been trying to write such function for a long time but couldn't get to the end, the unbalanced samples seem to be problematic.

I read that a way around would be to use the harmonic mean of the group sizes (Unweighted means analysis, i.e. if assumption that the groups are equally represented in the population of interest), what do you think? (In my case the harmonic mean would be 2/(1/31 + 1/33) = 31.97).

See example for the difference between weighted-unweighted means analysis: ANOVA1_Weight-Unweight.docx

Best wishes,

Corentin Wicht SNSF Doc.CH PhD student Neurology Unit, Medicine Section Faculty of Science and Medicine, University of Fribourg Ch. du Musée 5, CH-1700 Fribourg +41 (0)26 300 85 36 corentin.wicht@unifr.ch https://www3.unifr.ch/med/de/research/groups/spierer/

Mensen commented 5 years ago

Hi Corentin,

I'll reply to both your mail and this message at once :)

Regarding the balanced groups, there is no easy and quick way around this unfortunately, especially not for mixed ANOVA. When there are balanced group there are some easy shortcuts to performing the calculations so that they are still quick enough to go through the whole permutation process.

For unbalanced datasets, I developed some draft general linear model methods a little while ago (the code is in the toolbox actually but not well described), but these only work for either fully between or fully within designs (I had some difficulty getting the right error terms to compare for mixed designs and never worked it out). I did explore some of the weighted means analyses types, but didn't get it work properly... so abandoned the idea after some initial testing (perhaps digging some more there could work).

At the moment, the scripts for ANOVA designs aren't really in development anymore as I'm working more and more with linear mixed models to solve all my problems (see one of last publications using spindle analysis). This is still a very slow process as each channel/sample-pair undergoes its own calculation for the linear mixed model so it takes about a day to go through a whole dataset like this. So for testing purposes this is impossible, but if you're confident in the design, then for a publication analysis its worth just leaving things running over a weekend.

Sorry for not being able to give an easy "go-to" answer. Its a really interesting topic and worth pursuing given that most experimental designs are rarely solvable using just a direct t-test anymore... but this is why I skipped all the intermediate designs and went right to the linear mixed models which are very generalisable to any experimental design.

Mensen commented 5 years ago

Regarding your question on the p-value calculation:

Digging into the ept_TFCE & ept_TFCE_ANOVA functions, I couldn’t understand the rational behind the computation of p-values (see below):
edges = [maxTFCE;max(abs(TFCE_Obs(:)))];
[~,bin]     = histc(abs(TFCE_Obs),sort(edges));
P_Values    = 1-bin./(nPerm+2);
Why not use as a “permutation-based significance threshold”, for e.g., the 95% percentile of the null-hypothesis distribution (i.e. distribution of each permutation maxTFCE) using the following code (for alpha threshold of 5%):

prctile(maxTFCE,100-(100*0.05))

Or

U = round((1-0.05)*5000); % 5000 is the number of permutations
MaxTFCE=sort(maxTFCE);
Significance_thresh= MaxTFCE(U);

So I haven't actually tested your version of this code on any data so I'm not sure where the differences would be... but some thoughts just looking at the code:

I have no doubt that my code from years ago isn't optimal... I've used a slightly altered version in some more recent code and there's a more straightforward way to get the bin each value is in...
However, there should be no place for any thresholding value (5% or otherwise), in the calculation of the p-value itself. The p-value calculation is not at all based on the alpha value, only its interpretation depends on the alpha value (and that is an arbitrary position)... so any calculation of the p-value that has the alpha value in it, cannot be "correct".
Once you obtain the actual p-values you can of course threshold them at whatever value you feel comfortable with (as is done in the ept_cluster_res or more generally in the ept_Result_Viewer gui). But I leave this completely up to the user.

Hope that helps too!

All the best, Armand

CorentinWicht commented 5 years ago

Dear Armand,

many thanks for such detailed responses.

I realize that my second question was somehow confusing.

I have been trying to follow Lubell recommendations (i.e. https://github.com/Mensen/ept_TFCE-matlab/issues/19) to run the mixed ANOVA (2x2 within-between) relying on the perm_spANOVA from the FMUT toolbox and re-injecting the data into your ept_mex_TFCE2D function (for each permuted maps).

My problem arise when I am trying to calculate the p-values (using the same code as in ept_TFCE_ANOVA): [~,bin.A] = histc(abs(TFCE_Obs.A),sort(edges.A)); P_Values.A = 1-bin.A./(nPerm+2); [~,bin.B] = histc(abs(TFCE_Obs.B),sort(edges.B)); P_Values.B = 1-bin.B./(nPerm+2); [~,bin.AB] = histc(abs(TFCE_Obs.AB),sort(edges.AB)); P_Values.AB = 1-bin.AB./(nPerm+2);

I realize that the bin fields (A,B,AB) are all composed of matrices of zeros, hence the P_Values fields are all worth 1.

Then, I wondered what is the rationale behind calculating the distribution of p-values as you did (using the histc function)? And why not use the fcdf function (with corresponding degrees of freedom)? If I use the fcdf function I am getting coherent results, compared to the histc method.

Best wishes and again many thanks for taking the time to helping me,

Corentin Wicht

Mensen / ept_TFCE-matlab

Unbalanced Mixed ANOVA #23