0todd0000 / spm1d

One-Dimensional Statistical Parametric Mapping in Python
GNU General Public License v3.0
61 stars 21 forks source link

Unbalanced data anova1rm #125

Closed Ali-202 closed 4 years ago

Ali-202 commented 4 years ago

Hi, I'm having difficulty with an unbalanced data set for anova1rm. I get the attached error when I try to run the code. I have data for 19 subjects, all participating in 4 conditions. However, the number of good trials for each participant and condition varies; usually 5 but sometimes less. Is the only solution to remove trials from my data set until the same number exist for each condition and participant? Or is there another way to approach this? I did find a similar question from 2016 but I wasn't sure if things had changed since then to support unbalanced datasets so I hope it's okay to post here.

Thanks in advance, Ali Screen Shot 2020-04-16 at 13 49 39

0todd0000 commented 4 years ago

Yes, removing trials is indeed an option. It may be better to instead calculate means across trials, because the mean is usually a better estimate of the true average performance than an individual trial. If inter-subject variability is large relative to inter-trial variability (as it usually is in biomechanics), then this choice is somewhat moot: changing a specific trial will have negligible effects on the final results. However, as a general strategy it is usually better to choose the mean.

Using just the means (and imposing balance) is effectively equivalent to using a full model which (i) includes inter-trial variance and (ii) is non-balanced at the trial level.

If you'd like to verify the validity of this approach, try: (a) using just the inter-trial means, then (b) using just the first trial, then (c) using just one random trial

It is quite likely that the final ANOVA results will be very similar, albeit not numerically identical. This is an indirect validation approach that can be reported as a type of sensitivity analysis (if needed).

spm1d will indeed eventually support non-balanced designs, but this is still in development. The existing, in-development code works well in many cases of mild-to-moderate deviations from balance, but it does not work well for large deviations and/or with distribution oddities like outliers. Until these problems can be solved it will unfortunately remain non-public.

For now, I think your suggestion is the best: just choose one trial (or the mean), then check the stability of the results to different trial selections.

Allena-90 commented 4 years ago

Thank you for your swift reply!

0todd0000 commented 4 years ago

You're welcome!