0todd0000 / spm1d

One-Dimensional Statistical Parametric Mapping in Python
GNU General Public License v3.0
61 stars 21 forks source link

Unbalanced designs #56

Closed bernard-liew closed 7 years ago

bernard-liew commented 7 years ago

Hi Todd,

I have a question on unbalanced designs. I think part of the answer has been posted on a previous issue (https://github.com/0todd0000/spm1d/issues/43). I plan to use 2 different stats. A 2 way ANOVA with one within and one between subject factor. An a random effects model.

Can I impute a data set where there is unbalance design? For example in a data set of ankle plantar flexion moment, I have 30 subjects, and each subject has a varying number of good trials (3-10). Does the two aforementioned tests receive unbalanced datasets? I couldn't find the answers on the spm1d website. Many thanks.

Regards, Bernard

0todd0000 commented 7 years ago

Hi Bernard,

Thank you for raising this issue. spm1d's procedures have only been validated for balanced designs, so please either use only means or use the same number of trials for each subject. If the design is unbalanced the F statistics may be incorrect. For balanced designs the same F statistic will be produced regardless of whether you use means or all trials.

I thought there was something written in the ANOVA documentation at spm1d.org about balanced vs. unbalanced designs, but it doesn't seem to be there. Also I just realized that design balance checks do not seem to be working in spm1d 0.4; the software should raise an error or at least a warning if the design is not balanced. I'll need to find out what the problem is, so give me a day or two to post documents to spm1d.org and also to fix the warning / error messages.

Thanks again for raising this issue!

Todd

0todd0000 commented 7 years ago

Hi again Bernard,

Apologies for the delay. I've checked the code and actually the design balance checks seem to be working fine in both the MATLAB and Python versions of the code. I'm not sure why I thought they weren't working before, but it seems that they're OK. In MATLAB submitting unbalanced data should raise an error that looks something like this:

Error using spm1d.stats.anova.designs.ANOVA2/check_balanced (line 43)
Design must be balanced.

Note that unbalanced data is fine for one-way ANOVA (spm1d.stats.anova1) but that all other ANOVA procedures (including one-way repeated-measures) require balanced data.

Have you tried to submit unbalanced data to spm1d.stats.anova2onerm or another procedure? Please let me know whether or not it produces an error.

Todd

bernard-liew commented 7 years ago

Hi Todd,

Thanks for following this up. I have not used it as yet, but I will be doing it within the next 3 weeks. I will let you know then? Many thanks.

Regards, Bernard

0todd0000 commented 7 years ago

OK, I'll close the issue for now to indicate that this is not necessarily a software bug. Please feel free to re-open this issue when you get to your analyses. Todd

bernard-liew commented 7 years ago

Hi Todd,

How have you been? I am running two tests 1) "Two-way ANOVA with repeated-measures on one factor" and 2) "Three-way ANOVA with repeated-measures on two factors". This is an either or question. I would love to do test (2) if possible.

However, test (2) threw up a "ValueError: Design must be balanced.", but not test (1)

SUBJ: 30 subjects A (Between groups): 16 group 1 vs 14 group 2 B (within group time): 15 pre vs 15 post optional C (within group side): 15 right vs 15 left.

Why is my design unbalanced for test two and not test one?

Regards, Bernard

0todd0000 commented 7 years ago

Hi Bernard,

For Factor A it sounds like there is a total of 30 subjects, with 16 in one group and 14 in another?

If that is correct, then to add a repeated-measures Factor B (with two levels: "pre" and "post") there should be 60 total observations (30 for pre and 30 for post).

Then to add a second repeated-measures Factor C (with levels "right" and "left"), there should be 120 total observations (30 pre-right, 30 pre-left, 30 post-right, 30 post-left).

I think that is how the data should be organized but I'm not totally sure... please let me know if my interpretation doesn't match your experiment.

Todd

bernard-liew commented 7 years ago

Hi Todd,

Many thanks for the quick reply. Sorry for sloppiness, forgotten to multiply by two. You are totally right. 120 observations (30 pre-right, 30 pre-left, 30 post-right, 30 post-left).

Regards, Bernard

0todd0000 commented 7 years ago

Hi Bernard,

Thanks for confirming. Following that experimental design I can indeed reproduce the same ValueError. As I recall the reason for this ValueError is simply that spm1d's numerical results have not yet been verified for some designs using independently published results.

For example, in the examples folder you'll find the following file: ./spm1d/examples/stats0d/ex_anova2onermub.py There are a variety of independent datasets available on the internet for this (unbalanced) design so I was able to check spm1d's results against those, and since they appear to be correct I made unbalanced cases accessible without warnings / errors for this design.

For other designs, including anova3tworm, I've not yet found suitable third-party datasets, so I'm not certain that spm1d's results are accurate. They very well might be accurate, but I thought it would be best to restrict access to arbitrary unbalanced designs until spm1d's results can be verified.

Please let me know if you are aware any published datasets or public examples on the internet that we could use to check the anova3tworm results. Alternatively, we could check the results using random datasets and third-party software like R, so if you need to use anova3tworm with unbalanced data please let me know and I'll try to verify its results as as soon as possible.

Todd

bernard-liew commented 7 years ago

Thanks Todd,

Would it help if I provided with one data set, which I just collected, which exactly has that design?

Regards, Bernard

0todd0000 commented 7 years ago

Hi Bernard, Yes please do send the dataset if possible. Todd

bernard-liew commented 7 years ago

Many thanks Todd,

I have emailed directly with the datasets.

Regards, Bernard

0todd0000 commented 7 years ago

Hi Bernard,

Apologies for the delay. I've looked at the problem a bit more closely, but I am still unable to find a third-party dataset for unbalanced three-way repeated-measures ANOVA, so I'm unable to validate the results I'm getting with spm1d and R. The spm1d results appear to be matching the R results, but without a third-party dataset (and expected results) I'm not 100% confident that my R analyses are correct. To ensure that spm1d returns valid results I prefer to leave it as is, raising an error for unbalanced three-way RM designs. Please let me know if you are aware of any third-party datasets we could use for verification. There may be some buried in software packages like SPSS, S, Minitab, etc.

If we can't find a verification dataset, here are two other options for proceeding:

Todd

bernard-liew commented 7 years ago

Dear Todd,

Many thanks for the kind advice. I will brain storm the options you have suggested. In the mean time I will notify you if I do come across any third party data set with unbalanced 3 way designs.

Regards, Bernard