0todd0000 / spm1d

One-Dimensional Statistical Parametric Mapping in Python
GNU General Public License v3.0
61 stars 21 forks source link

Strings for factors in ANOVAs #79

Closed kmshort closed 6 years ago

kmshort commented 6 years ago

I'm not sure what your feelings are on this, but it would be great if the values for the group/subgroup factors could be strings. At the moment, an error is thrown "invalid literal for int() with base 10: 'somegroupname'".

For example:

Y = np.array([5,5,3,2])
A = np.array(['Group1','Group1','Group2','Group2'])
B = np.array(['Fred', 'Todd', 'Bruce', 'Angela'])

spm1d.stats.anova2nested(Y, A, B, squal_var=True)

returns

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-14-3d668dd9b370> in <module>()
----> 1 spm1d.stats.anova2nested(Y, A, B, equal_var=True)

C:\ProgramData\Anaconda2\lib\site-packages\spm1d\stats\anova\ui.pyc in anova2nested(Y, A, B, equal_var, roi)
    191         if equal_var is not True:
    192                 raise( NotImplementedError('Non-sphericity correction not implemented. To continue you must assume equal variance and set "equ
al_var=True".') )
--> 193         design  = designs.ANOVA2nested(A, B)
    194         model   = models.LinearModel(Y, design.X, roi=roi)
    195         model.fit()

C:\ProgramData\Anaconda2\lib\site-packages\spm1d\stats\anova\designs.pyc in __init__(self, A, B)
    246         def __init__(self, A, B):
    247                 self.X          = None
--> 248                 self.A          = Factor(A)
    249                 self.B          = FactorNested(B, self.A)
    250                 self.J          = self.A.J

C:\ProgramData\Anaconda2\lib\site-packages\spm1d\stats\anova\factors.pyc in __init__(self, A)
     18 class Factor(object):
     19         def __init__(self, A):
---> 20                 self.A            = np.asarray(A, dtype=int)        #integer vector of factor levels
     21                 self.J            = None     #number of observations
     22                 self.u            = None     #unique levels

C:\ProgramData\Anaconda2\lib\site-packages\numpy\core\numeric.pyc in asarray(a, dtype, order)
    480
    481     """
--> 482     return array(a, dtype, copy=False, order=order)
    483
    484 def asanyarray(a, dtype=None, order=None):

ValueError: invalid literal for int() with base 10: 'Group1'

and this is just for factor A, the same issue would be raised with factor B..

Perhaps in-code, string factors could be coerced to ints, and then at the end of processing returned as their original strings?

0todd0000 commented 6 years ago

Hi, thanks for raising this issue. I've added this to the list of feature requests here: Issue #45

In the meantime, consider converting to integer arrays using np.unique like this:

Atable,Ai = np.unique(A, return_inverse=True)
Btable,Bi = np.unique(B, return_inverse=True)

Here Ai and Bi are vectors containing integer representations of the original A and B labels, and the actual labels can be retrieved via the mappings in Atable and Btable.