danielhomola / mifs

Parallelized Mutual Information based Feature Selection module.
BSD 3-Clause "New" or "Revised" License
288 stars 110 forks source link

Error calling fit function, classes of type String (Y) #11

Closed y0uCeF closed 7 years ago

y0uCeF commented 7 years ago

Hello, I'm trying to do feature selection for some Microarray data using this module but I'm failing to do so. My target Y contains values of type string (S3) which represent class names. The execution of

feat_selector = mifs.MutualInformationFeatureSelector()
feat_selector.fit(X, Y)

fails with the following error :

warnings.warn("Variables are collinear.")
Traceback (most recent call last):
  File "main.py", line 69, in <module>
    feat_selector.fit(X, Y.tolist())
  File "/usr/lib/python2.7/site-packages/mifs-0.0.1.dev0-py2.7.egg/mifs/mifs.py", line 149, in fit
    return self._fit(X, y)
  File "/usr/lib/python2.7/site-packages/mifs-0.0.1.dev0-py2.7.egg/mifs/mifs.py", line 193, in _fit
    self.X, y = self._check_params(X, y)
  File "/usr/lib/python2.7/site-packages/mifs-0.0.1.dev0-py2.7.egg/mifs/mifs.py", line 308, in _check_params
    if self.categorical and np.any(self.k > np.bincount(y)):
TypeError: Cannot cast array data from dtype('S3') to dtype('int64') according to the rule 'safe'

I don't know a lot about these methods, does my Y have to be an Integer?

danielhomola commented 7 years ago

you'll have to convert your outcome variable to integers. you can do it easily with: http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html

y0uCeF commented 7 years ago

Thanks for your quick response, I'll give it a try. It was more of a question than an issue, I just didn't know where to ask.