0todd0000 / spm1d

One-Dimensional Statistical Parametric Mapping in Python
GNU General Public License v3.0
61 stars 21 forks source link

Two Sample T-Test (ttest2) equal_var parameter raises ValueError (Python 3.6) #133

Closed cailin-rice closed 4 years ago

cailin-rice commented 4 years ago

Hi Todd,

I am currently trying to conduct a simple, two sample t-test for tendon loads during running under different running conditions. I am comparing processed experimental data to simulated/estimated data from a processing pipleline I have developed.

I have run the example ttest2 script and it runs fine (it produces the output figures as in the documentation). My code to run the spm1d functionality is as follows:

t = spm.stats.ttest2(data_set_1_final, data_set_2_final, equal_var=False) ti = t.inference(alpha, two_tailed=True, interp=True)

Where data_set_1 and data_set_2 have the same shape: (6, 101).

My question has to do with the equal_var parameter. When I run my code to conduct the spm analysis it produces a ValueError:

_t  = spm.stats.ttest2(data_set_1_final, data_set_2_final, equal_var=False)

File "C:\ProgramData\Anaconda2\envs\python36\lib\site-packages\spm1d\stats\t.py", line 235, in ttest2 return glm(Y, X, c, Q, roi=roi)

File "C:\ProgramData\Anaconda2\envs\python36\lib\site-packages\spm1d\stats\t.py", line 61, in glm df = _reml.estimate_df_T(Y, X, eij, Q)

File "C:\ProgramData\Anaconda2\envs\python36\lib\site-packages\spm1d\stats_reml.py", line 57, in estimate_df_T V,h = reml(YY, X, Q)

File "C:\ProgramData\Anaconda2\envs\python36\lib\site-packages\spm1d\stats_reml.py", line 197, in reml C += Q[i] * float(h[i])

ValueError: operands could not be broadcast together with shapes (6,6) (0,0) (6,6)_

However, when I change the equal_var parameter to True, the code runs fine and produces an output graph as desired (see attached). The only problem I have is that I disagree with the areas of statistical difference highlighted by the spm tool with equal_var=True.

In my opinion, the attached plot should have grey regions from about 10% gait 50% gait - as the std deviation curves don't overlap and there is an obvious difference between the two "mean" curves.

tendon_analysis_plot

I am certain that assuming equal variances for the experimental and simulated data sets is an invalid assumption. Please could you explain how to correct this error or if there is an error in my logic?

Thank you in advance! :)

0todd0000 commented 4 years ago

Hello, thank you for reporting this bug!

I have two questions:

  1. Does the example file (/spm1d/examples/stats1d/ex_ttest2.py) run OK with a built-in dataset and with equal_var=False?

If yes to 1, then I think I might know what the problem is:

  1. Are the values in the first ~5% all zero, all identical, or all close-to-identical?

If yes, then this zero or close-to-zero variance might be causing a problem for the non-sphericity code. Try removing the first ~5% then re-running. Changing the array shape to (6, 95), for example, is fine; spm1d will re-calibrate to the data array size automatically.

cailin-rice commented 4 years ago

Thank you for your quick response. To answer your questions:

  1. Yes, the example files runs OK with the a built-in dataset and with equal_var=False.

  2. Yes they are close to identical (~0), however when I remove these values from the dataset, I still get the same ValueError as above, when equal_var=False. If I change equal_var=True, I get the plot below.

tendon_analysis_plot

Thanks so much for your help.

0todd0000 commented 4 years ago

OK, thank you for confirming. I can't replicate this error, but I think I found out the potential problem. The error message you sent contains the following line:

C:\ProgramData\Anaconda2\envs\python36\lib\site-packages\spm1d\stats_reml.py

The file stats_reml.py is from a previous version of spm1d (I think from version 0.3.x). The current version is 0.4.5 (2020/05/09). Please check your version of spm1d, and if it is not 0.4.x, please try updating then re-running.

cailin-rice commented 4 years ago

Hi Todd,

I checked the version of spm1d, and it seems I have version 0.4.2. I un-installed and then re-installed using "pip install spm1d" and still have version 0.4.2. I no longer get the error when I run my code, however, the shaded areas of statistical significance don't make sense to me - they remain the same as above. Could this be due to the fact the size of my datasets are too small?

0todd0000 commented 4 years ago

I'm glad that the error is no longer being generated. I've not yet uploaded the most recent version to pypi, so that's why the pip command didn't work, but as long as the error is gone, then everything should be OK; versions after 0.4.2 contain only minor tweaks.

Yes, the unexpected statistical results are probably related to the small sample size. If you were to artificially increase sample size by copying all observations, so that there are six observations per group instead of three, then the statistical results will likely converge to your expectations.

cailin-rice commented 4 years ago

Thanks for you help Todd, all seems correct, except for one set of data that I have.

achilles_tendon_analysis_plot

The second and third statistical difference clouds that have been identified seem incorrect. Do you know what may have caused this? I have artificially increased my data size to 10 (I have tried 20 and 30 but the plots still look the same). This seemed to correct the problem I was having with my other data set.

patella_tendon_analysis_plot

Thanks again :)

0todd0000 commented 4 years ago

The t statistic value may seem unexpectedly large, but this suggests only that the mean difference is large with respect to the variance. If you zoom in on the area around time=70%, for example, you will likely see small variance relative to the difference.

So I suspect that the results are correct in terms of the mean : variance ratio. If you are uninterested in the portion of data where force is close to zero, then this portion of the data should probably not be included in the analysis. Ideally this should be set in an a priori manner; for example: "we ignored all forces less than 50 N".

cailin-rice commented 4 years ago

Thank you for all of your help Todd. The spm1d is a really interesting and valuable contribution to biomechanics.