Understanding the dataset

miller-carvalhaes commented 9 years ago

Hello Tood,

I am opening this issue, because I would like to understand how to organize the data set to use the scripts in python.

First of all I'm trying to recreate the same simulation as provided by Appendix A in you article "Vector field statistics for objective center-of-pressure trajectory analysis during gait, with evidence of scalar sensitivity to small coordinate system rotations" (http://www.sciencedirect.com/science/article/pii/S0966636214000630)

untitled

In this way, customize the example "ex_hotellings_paired_Pataky2014" to load the simulation values instead the original data set, as follow:

untitled2

I read the documentation of the Multivariate tests, but I did not understood how to construct that matrix as described:

In all multivariate 1D tests dependent variables are (J x Q x I) arrays J = number of 1D responses Q = number of nodes to which the 1D responses have been resampled I = number of vector components

Thank you for your attention.

0todd0000 commented 9 years ago

Hi Milly, The Appendix A example contains 0D data, not 1D data, so the 1D example is not directly relevant. Instead, please follow the 0D example here: ./spm1d/examples/stats0d/ex_hotellings2.py For 0D analysis the data should be stored in (J x I) arrays. Todd

miller-carvalhaes commented 9 years ago

Hello Tood, sorry for taking your time, but I would like to ask few more questions.

I will try to explain my experiment to show my doubts.

I am using SPM1D to analyze the elbow joint trajectory between two different motion capture systems (Kinect x Vicon). Five subjects participated in the experiment. The trajectory have 1692 samples. Thus, each (1692 × 3) elbow trajectory was regarded as a single vector field r(q) = {rx(q) ry(q) rz(q)}.

In this case, to perform Hotelling's paired T2 test I used the matlab version of the 1D example ./spm1d/examples/stats1d/ex_hotellings_paired_Pataky2014.m. After that I performed post hoc t according to the example ./spm1d/examples/stats1d/ex1d_ttest_paired.m.

I would like to ask, If I have not committed any mistake, what means the high SPM {T2} threshold since statistical difference was found in post hoc on rz(q)

The following images show my results:

Hotelling's paired T2 test on r(q) smp-t2

Post hoc scalar field tests on rx(q), ry(q) and rz(q), respectively smp-t x smp-t y smp-t z

Thank you for your attention

0todd0000 commented 9 years ago

Hi Milly, Thank you for your question. There are a number of issues to consider:

Post hoc analysis, general justification. Strictly speaking, post hoc analyses are not justified unless the main test reaches significance. This is true for both ANOVA and multivariate analyses.
Post hoc analysis, correction for multiple comparisons. You must adopt a correction for multiple comparisons when performing post hoc tests. For univariate data (i.e. ANOVA), the simplest way to correct for multiple comparisons is to use a Bonferroni correction (see spm1d.util.p_critical_bonf).
Multivariate vs. univariate results. Multivariate results and univariate results are not directly comparable because univariate analyses do not consider the covariance amongst the variables. As you can see in Table S1 (at the top of this Issue), the multivariate results can be very different from the univariate results. Post hoc results for the X, Y and Z components, for example, are generally not valid because they assume that these components are independent, which is only very rarely true. Appropriate post hoc tests for multivariate analyses use the covariance matrix to retain alpha across the post hoc tests. These procedures are not yet available in spm1d.
Multivariate analyses, small sample sizes. The sample size should generally be 3-to-5 times greater than the number of vector components. In this case there are three vector components, so a sample size of five is probably too small to make meaningful conclusions. The main reason is that there are six variance / covariance components (X, Y, Z, XY, XZ, YZ), and you need larger sample sizes to estimate those accurately. When the sample size is small, the critical threshold can rise to extremely high levels, as you observed in your Hotelling's paired test. If you have a larger sample size (for example, just try copying your existing dataset twice or three times) you should find that the critical threshold falls to much lower levels.
Smoothness. The data appear quite rough. When the data are rough the thresholds will be quite high.
Cyclical movements? It appears that the signal is cyclical, with about five cycles (in the rz data)? If that's true it might be better to treat each cycle as a separate observation. This might explain why there are very large T² maxima at about times=420, 800, 1150.

Todd

miller-carvalhaes commented 9 years ago

I really appreciate your help, with these new information I will discuss with my guiding professor about the size of our samples and also about analyze each cycle separately.

Thank you once again.

0todd0000 commented 9 years ago

No problem at all, if any other problems arise please feel free to open a new issue. Todd

0todd0000 / spm1dmatlab

Understanding the dataset #4