explanation on anova posthoc test

aaa34169 commented 9 years ago

Dear Todd

Your library is awesome . I would like you to confirm me this point.

1) If I look your article : Vector field statistical analysis of kinematic and force trajectories" in JoB, epecially section 3.2 dedicated to muscles forces. You found a significant effect over the entire cycle ( figure 7), because you hypothesize ALL muscle forces were similar ? this result seems to me obvious.
After post-hoc, you focused on the difference with control. Then you found less significant regions. only the medgastroc was significant.

2) Now imagine this case, your anova will show a significant region during the swing phase only. But after post-hoc, you will also find a small significant region during the stance phase. should i interpret this former region ?

regards

Fabien

m-a-robinson commented 9 years ago

Dear Fabien,

Although your query is initially addressed to Todd, maybe I too can help.

Firstly, I should say following publication we discovered that we made minor coding errors which have non-negligible effects on the critical thresholds reported for the multivariate tests in Figs. 5, 7 & 9 of the main manuscript. We believe that the errors are fortunately negligible in the context of the paper's broader messages regarding bias in `non-directed' hypothesis testing.The new figure 7 is as follows:

clipboard01

To address your comments 1) "this result seems to me obvious" To clarify, even in the main test, we were testing to see if all muscle forces were different between the control and PFP groups. Unless you have specific expectations (i.e. a directed hypothesis) relating to a particular muscle then we consider that this is the appropriate first hypothesis to be tested. If the null hypothesis is not supported then you can further examine in a post-hoc manner which specific muscles differed.

"Then you found less significant regions. only the medgastroc was significant." Yes, in the post-hoc analysis only med gastroc differed significantly between groups. This is because the post-hoc tests do not map directly to the main test result. This is also the case in traditional ANOVA and post-hoc tests. As we use a Sidak corrected threshold to reduce likelihood of Type 1 error, alpha was therefore 0.0051. Having 10 muscle forces means that the critical threshold is increased to only allow 0.51% of random data to exceed the threshold. This may lead to differences between the main test and the post-hoc results.

2) "should i interpret this former region" Firstly, you should decide if your question is answered by the main test result. If it is, then you may not need the post-hoc analysis. If you are interested in the post-hoc result then if you appropriately control for Type 1 error then the scenario you suggest is probably unlikely. If you have appropriately controlled for Type 1 error and you find a difference in the post-hoc, it is likely that this would also be present in the main vector analysis.

Hope that helps in the first instance, or at least until Todd can reply :-) Regards Mark Robinson

0todd0000 commented 9 years ago

Thanks for the feedback Fabien. Excellent responses Mark.

One minor addition: the null hypothesis was actually: "equivalence of the mean multi-muscle time series between Controls and PFP". To test this null hypothesis one must consider the behavior of random multi-muscle time series, or equivalently: the behavior of a 10-component vector which evolves randomly in 1D time. The alpha-defined critical threshold describes the height that such random T2 trajectories (produced by smooth, random vector fields) would reach in the same experimental design in only 5% of repeated experiments. Thus rejecting this test is probabilistically equivalent to rejecting a hypothesis for a simple t test. So I think we should be precisely as surprised to reject this vector field test as we are to reject a simple t test.

Please note that a corrigendum to the paper is currently in press, it should be available online soon. Apologies for any confusion.

aaa34169 commented 9 years ago

Thanks Todd and mark for your explanations. if figure is wrong, is figure 8 too ?

The post hoc figure higlighted significant regions between 0 and 20 % of gait cycle that are not be present in the new figure of the main anova.
I's exactly the case i question. How do you interpret

Presently, I try to master implementaion. I have examined all your example. if i have to code the example of your article, i would write spm1d.stats.anova1((Muscle1,muscle2,muscle3...), equal_var=True ) where muscle1 are the difference between controls and PFP is it more subtle ?

regards

Fabien

0todd0000 commented 9 years ago

Hi Fabien,

The Fig.8 results are fine as far as we know. The errors described above exist only for the multivariate (vector) field results. Please do consult the Corrigendum for clarification; it should be available online at J.Biomech in the coming weeks.

To interpret the data from Figs.7&8 it is important to consider the null hypotheses, and in particular the assumptions underlying the Fig.8 null hypotheses: -- Fig.7: the null hypothesis is "equivalent mean vector trajectories" -- Fig.8: the null hypotheses are "equivalent mean scalar trajectories", and these ten tests assume independence amongst the ten muscles. Proper post hoc procedures for the main multivariate analysis (Fig.7) would require that the data be transformed from the single-muscle space to an abstract multi-muscle space, where individual muscles' contributions to the Fig.7 results are weighted; note that Fig.7 tests the VECTOR difference, and thus that only a weighted contribution of the muscle signals will reproduce the results in Fig.7. The Fig.8 therefore depict a simpler analysis than the one in Fig.7, because Fig.8 assumes muscle independence. The discrepancies between Fig.7 & 8 therefore result from Fig.8's assumption of muscle independence, which is not a justified assumption in the context of the main (vector field) null hypothesis.

The command:

spm1d.stats.anova1((Muscle1,muscle2,muscle3...), equal_var=True )

is partially correct, but please note that ANOVA is a univariate test. Our paper argues that this is a multivariate (vector) dataset, and that the appropriate multivariate procedure (in this case a Hotelling's T2 test) is necessary to test hypotheses pertaining to that dataset. Hotelling's T2 tests and other multivariate procedures are currently not available in spm1d, but they will be able in the next major release.

Todd

aaa34169 commented 9 years ago

Hi todd

Thank you for your explanation. I will monitor the next major release. spm seems perfect for my study. I will give you my feedback. Thank you for your job

Fabien

aaa34169 commented 9 years ago

Hello Todd and mark

I am back with a another question about your article. I understand your article dealed with multivariate analysis. So, if i look your table S2 in supplementary data. you found a significant T2 when you compared the vector F=[Fx,Fy]. With this result, if i wish know which componant is significant , i have to run ttest like you did at the row (d). However, You found no significant tx and ty. how do you interpret this result.

I think my problem should be dealt with multivariate analysis, is it possible to access to an experimental branch of spm1d.

Fabien

0todd0000 commented 9 years ago

Hi Fabien, That's an important question that gets to the heart of multivariate analysis. Neither component is significant because only a weighted combination of the components is significant. Another way you can think of it is as follows: If you rotate the coordinate system (which defines the vector components) so that the x axis is parallel to the mean difference vector, then a t test on the x component will yield the same p value as the T2 test. The vector T2 result is constant for all coordinate system definitions, but results for the scalar t tests change when you rotate the coordinate system. Todd

aaa34169 commented 9 years ago

Hi Todd, This case is not easy to interpret.

Your figure S3 in appendix despicts your comment. Now, I see how to compute spm(T2)(q) but i don't know how to set the critical T2. could you explain me ? Fabien

2014-10-10 14:46 GMT+02:00 Todd Pataky notifications@github.com:

Hi Fabien, That's an important question that gets to the heart of multivariate analysis. Neither component is significant because only a weighted combination of the components is significant. Another way you can think of it is as follows: If you rotate the coordinate system (which defines the vector components) so that the x axis is parallel to the mean difference vector, then a t test on the x component will yield the same p value as the T2 test. The vector T2 result is constant for all coordinate system definitions, but results for the scalar t tests change when you rotate the coordinate system. Todd

— Reply to this email directly or view it on GitHub https://github.com/0todd0000/spm1d/issues/3#issuecomment-58650361.

Fabien Leboeuf

Ingénieur de recherche au CHU de Nantes

- Docteur en mécanique-Biomécanique de l'Université de Poitiers

Pôle Médecine Physique et Réadaptation Hôpital St Jacques 85 rue saint Jacques 44 093 Nantes cedex1 --------------- Tél : 02 40 84 60 88

Port: 06 07 79 02 44 *

0todd0000 commented 9 years ago

Hi Fabien, The T2 statistic is related to the F statistic as follows:

(m - p + 1) / (pm) T2 = F

where m and p are the degrees of freedom and number of vector components, respectively. See more details here at Wikipedia's article on the Hotelling T-squared distribution:

To compute the critical T2 for a 0D test (like in Appendix B of the manuscript) you can compute the critical F value (using the F distribution's inverse survival function) and then transform it to a critical T2 value using the equation above.

For 1D tests it's almost identical; you only need one additional parameter: field smoothness. After computing field smoothness you submit the m, p and smoothness values to a random field theory (RFT) inverse survival function. If you're not familiar with smoothness computation, RFT computations, and inverse survival functions it might be a bit tricky to implement.

We plan to release our multivariate code (including Hotelling's tests) in the next major software update (Version 0.3), which will likely be available early next year. Do you need to compute the critical values sooner than that?

Todd

aaa34169 commented 9 years ago

Hi Todd

sorry for the delay of my reply. I was reflecting about my problem. I implemented an anova problem.
However i actually think a multivariate approach would be better. I appreciate having your advice. could i explain you my problem ? Do you wish i continue to writing you on github ?

regards

Fabien

0todd0000 commented 9 years ago

Hi Fabien, Other users might have similar questions, so please open a new issue on "multivariate statistics". Thanks, Todd

0todd0000 / spm1d