Open gialmisi opened 2 years ago
Looks like imodels
classifiers only work with binary classification problems.
The iris
dataset deals with a multi-class classification problem. The code snippet can be fixed by transforming the label from multi-class to binary:
from sklearn.datasets import load_iris
from imodels import RuleFitClassifier
import numpy
iris = load_iris()
X, y = iris.data, iris.target
# THIS! Predict if the iris species is "virginica" or not
y = numpy.where(y == 2, 1, 0)
#print(y)
rulefit = RuleFitClassifier()
rulefit.fit(X, y)
print(rulefit)
Thanks @vruusmann. Just spent a while working this out myself, came here to report it, and found this issue. The issue can also be seen if you replace print(rulefit)
with rulefit.predict(X)
in the original code snippet.
Suggested action - the documentation should be changed to reflect this limitation - there is nothing here which indicates that multiclass classification won't work. (Although here, in a table of which tasks are supported by the different models, only Binary classification and Regression are mentioned).
Better still - raise an explicit error when y is multiclass, explaining that it needs to be binary.
Best of all - support multiclass classification!
related to https://github.com/csinva/imodels/issues/93, https://github.com/csinva/imodels/issues/77 - so this issue is a duplicate
Thanks both for the interest in the package and for raising these issues!
50431c8ec62edd97646fbea968d71e964262761f adds code to raise an Error explaining that multiclass is not supported for RuleFitClassifier during fitting. Hopefully we can actually support multiclass classification soon...
The following code snippet results in an error:
The error reads:
I tried to look into this issue myself, but I am not familiar enough with the method to make any definitive claims. However, this line of code seems fishy. Why not just use the actual number of features stored in
self.n_features
? Could be a source of the indexing error.