Python - datasets with '0' throwing divide by zero. workaround?

kmshort commented 5 years ago

Hi Todd & all,

Apologies if this is not a bug. After all, doing the following below only issues a RuntimeWarning

I am wondering if there is a workaround for an issue I'm having performing a nested anova with datasets that naturally have the odd 'zero' value. I'm using Python 2.7. essentially, my values: measures = np.asarray([[1,1,1,1,1,1,1,1,1,0,2,0,1,1,1,2],[0,1,1,1,0,1,1,0,1,0,0,1,1,1,1,0],[1,1,1,1,2,2,1,1,1,1,1,1,1,1,1,1],[2,1,1,1,0,0,1,1,1,0,1,1,1,2,2,2]]) are representing measurements in 4 samples, samples = np.asarray([0,1,2,3]) from 2 groups.. groups = np.asarray([0,0,1,1])

So when I try to perform spm1d.stats.anova2nested(measures, Group, Samples, equal_var=True), it throws an error, example:

import numpy as np
import spm1d

measures = np.asarray([[1,1,1,1,1,1,1,1,1,1,2,1,1,1,1,2],[10,10,10,1,1,10,1,10,1,10,1,1,10,1,1,1],[1,1,1,1,2,2,1,1,1,1,1,1,1,1,1,1],[2,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2]])
Group = np.asarray([0,0,1,1])
Samples = np.asarray([0,1,2,3])

FF = spm1d.stats.anova2nested(measures, Group, Samples, equal_var=True)
FFi = FF.inference(0.05)
print FFi
print FFi.get_p_values()

C:\ProgramData\Anaconda2\lib\site-packages\spm1d\stats\anova\ui.py:35: RuntimeWarning: divide by zero encountered in divide
f           = ms0 / ms1
C:\ProgramData\Anaconda2\lib\site-packages\spm1d\stats\_spm.py:319: RuntimeWarning: invalid value encountered in double_scalars
dx     = (z1-u) / (z1-z0)
C:\ProgramData\Anaconda2\lib\site-packages\spm1d\stats\_spm.py:324: RuntimeWarning: invalid value encountered in double_scalars
dx     = (z0-u) / (z0-z1)
C:\ProgramData\Anaconda2\lib\site-packages\scipy\stats\_discrete_distns.py:495: RuntimeWarning: invalid value encountered in greater_equal
return mu >= 0
SPM{F} inference list
design    :  ANOVA2nested
nEffects  :  2
Effects:
A     z=(1x16) array       df=(1, 2)    h0reject=False
B     z=(1x16) array       df=(2, 4)    h0reject=True
([], [nan])

The first error (thrown from ui.py), is from FF = spm1d.stats.anova2nested(measures, Group, Samples, equal_var=True) the remaining errors are from FFi = FF.inference(0.05)

This following code is a normal working example, with 0 values are replaced with integers >0:

import numpy as np
import spm1d

measures = np.asarray([[1,1,1,1,1,1,1,1,1,1,2,1,1,1,1,2],[10,10,10,1,1,10,1,10,1,10,1,1,10,1,1,1],[1,1,1,1,2,2,1,1,1,1,1,1,1,1,1,1],[2,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2]])
Group = np.asarray([0,0,1,1])
Samples = np.asarray([0,1,2,3])

FF = spm1d.stats.anova2nested(measures, Group, Samples, equal_var=True)
FFi = FF.inference(0.05)
print FFi
print FFi.get_p_values()

SPM{F} inference list
   design    :  ANOVA2nested
   nEffects  :  2
Effects:
   A     z=(1x16) array       df=(1, 2)    h0reject=False
   B     z=(1x16) array       df=(2, 4)    h0reject=True

([], [0.04877060803793987, 0.04877060803793987, 0.04877060803793987, 0.04877060803793987, 0.04877060803793987])

So, is there a workaround? Also, why does my use of inference .get_p_values() generate so many values (the last line quoted in the last code block) -- ([], [0.04877060803793987, 0.04877060803793987, 0.04877060803793987, 0.04877060803793987, 0.04877060803793987])?

thanks for your time, Kieran

0todd0000 commented 5 years ago

Hi Kieran,

Apologies for the non-informative errors! The main problem here is that four samples is insufficient for 1D analysis. In the minimal (2x2) nested example: ./spm1d/examples/stats1d/ex_anova2nested.py note that there are 20 samples in total.

Actually, looking at your data again, I wonder if you're trying to run 0D analysis? If that is the case, the measures needs to be flattened from a (4, 16) array into a (64,) array. Group and Samples must also both be (64,). In other words, for 0D univariate analysis the dependent variable must be a 1D array, and for univariate 1D analysis the dependent variable must be a 2D array. This example might be relevant: ./spm1d/examples/stats0d/ex_anova2nested.py

Todd

kmshort commented 5 years ago

Hi Todd, Thanks for that. Mmm. I was trying to shoehorn spm1d to perform a standard 2-factor nested ANOVA as a statistical test, of multiple measurements within samples, between samples, within groups. So I guess that is indeed a 0d analysis.

I had a look at the example dataset from http://www.southampton.ac.uk/~cpd/anovas/datasets/Doncaster&Davey%20-%20Model%202_1%20Two%20factor%20nested.txt

for others who may follow, this is the Southampton data:

2.1 TWO FACTOR NESTED MODEL Y = B(A) + e

Analysis of terms: A + B(A)

Data:

A   B   Y
1   1    4.5924
1   1   -0.5488
1   1    6.1605
1   1    2.3374
1   2    5.1873
1   2    3.3579
1   2    6.3092
1   2    3.2831
2   1    7.3809
2   1    9.2085
2   1   13.1147
2   1   15.2654
2   2   12.4188
2   2   14.3951
2   2    8.5986
2   2    3.4945
3   1   21.3220
3   1   25.0426
3   1   22.6600
3   1   24.1283
3   2   16.5927
3   2   10.2129
3   2    9.8934
3   2   10.0203

Model 2.1(i) A is a fixed or random factor, B is a random factor:

  Source  DF       SS      MS     F      P
1 A        2   745.36  372.68  4.02  0.142
2 B(A)     3   278.02   92.67  9.26  0.001
3 S(B(A)) 18   180.18   10.01
  Total   23  1203.56

Model 2.1(ii) A is a covariate of the response, B is a random factor:

  Source  DF       SS      MS      F      P
1 A        1   745.20  745.20  10.72  0.031
2 B(A)     4   278.18   69.55   6.95  0.001
3 S(B(A)) 18   180.18   10.01
  Total   23  1203.56

__________________________________________________________________

Doncaster, C. P. & Davey, A. J. H. (2007) Analysis of Variance and
Covariance: How to Choose and Construct Models for the Life Sciences.
Cambridge: Cambridge University Press.
http://www.southampton.ac.uk/~cpd/anovas/datasets/

So I think I know how to format the data now.

I actually posted about the general query I initially had over a year ago, it's just taken a long time to get back to it :) https://stackoverflow.com/questions/48273276/nested-anova-in-python-with-spm1d-cant-print-f-statistics-and-p-values

thanks Todd.

kmshort commented 5 years ago

Hi Todd,

Thanks again. I see that 0D requires a balanced design. I'm working with biologically messy data, so designs are rarely balanced with unequal sample sizes being quite common. Is there any way of working with unbalanced designs with spm1d?

best regards, Kieran

0todd0000 commented 5 years ago

Hi Kieran, Unfortunately this is not yet possible in spm1d. For 0D data its implementation is straightforward, but there are a variety of complications for 1D data. Some procedures are implemented but not yet validated. Provided these validations can be completed, unbalanced m-way ANOVA will appear in the next major release. As a workaround, you could use SPM12, which supports arbitrary ANOVA. Todd

kmshort commented 5 years ago

Thanks Todd, but SPM12 is MATLAB only, not Python. FYI, my data is 0D. Thanks again Todd, I really appreciate your ongoing support.

0todd0000 commented 5 years ago

Understood. If your analyses are focussed on 0D data, how about statsmodels or rpy2?

0todd0000 / spm1d

Python - datasets with '0' throwing divide by zero. workaround? #96