Closed edesz closed 6 years ago
@edesz we were actually just working on the documentation for this feature, which you can read in the development documentation: Confusion Matrix: Plotting with Class Names.
When you fit with integer classes but specify class names, the ConfusionMatrix
visualizer requires a mapping of integer to class name. You can give the visualizer a label_encoder
which can either be a sklearn.preprocessing.LabelEncoder
or it can be a python dictionary.
My suggestion for your code is as follows:
# Encode the categorical data with one-hot encoding
X = pd.get_dummies(data[features])
# Convert unique classes (strings) into integers
encoder = LabelEncoder()
y = encoder.fit_transform(data[target])
# Create test and train splits
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size =0.1, random_state=42)
# The ConfusionMatrix visualizer taxes a model
model = LogisticRegression()
cm = ConfusionMatrix(model, classes=encoder.classes_, label_encoder=encoder)
Alternatively, if you do not encode y
and instead pass in string values, the LogisticRegression
will take care of the encoding under the hood.
Hi @bbengfort , many thanks for your reply! I had read this in the current docs but I was trying it out with only label_encoder=encoder
and ommitting the classes=encoder.classes_
part (which is required) since I was actually incorrectly thinking the label_encoder
argument would have been sufficient on its own. So I didn't try anything further. Thanks for he new documentation example - it definitely helps for this case.
Your reply makes sense and this definitely answers my question.
Hi I'm new to yellowbricks and trying to explore things. I have tried everything but couldn't get why am I facing this issue in generating my confusion matrix. Only a part of code is shown below to give you understanding
from sklearn.model_selection import train_test_split
FeatureData_Train, FeatureData_Test, TargetData_Train, TargetData_Test = train_test_split(FeatureData,TargetData, test_size = 0.30, random_state = 10)
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
neighbor=KNeighborsClassifier(n_neighbors=3) # Creating an Object of KNN Classifier
neighbor.fit(FeatureData_Train,TargetData_Train) # Training the model to classify
PredictionData=neighbor.predict(FeatureData_Test) # Predicting the Response
print ("KNeighbors accuracy score : ",accuracy_score(TargetData_Test, PredictionData))
from yellowbrick.classifier import ConfusionMatrix
cm = ConfusionMatrix(neighbor, classes=['0','1'])
cm.fit(FeatureData_Train,TargetData_Train)
cm.score(FeatureData_Test,TargetData_Test)
Error :
C:\Users\Strat Com\PycharmProjects\IGN Review\venv\lib\site-packages\sklearn\metrics\classification.py:261: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
if np.all([l not in y_true for l in labels]):
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-927-3a3d9e9d43f9> in <module>
----> 1 cm.score(FeatureData_Test,TargetData_Test)
~\PycharmProjects\IGN Review\venv\lib\site-packages\yellowbrick\classifier\confusion_matrix.py in score(self, X, y)
172 # Compute the confusion matrix and class counts
173 self.confusion_matrix_ = confusion_matrix_metric(
--> 174 y, y_pred, labels=self.classes_, sample_weight=self.sample_weight
175 )
176 self.class_counts_ = self.class_counts(y)
~\PycharmProjects\IGN Review\venv\lib\site-packages\sklearn\metrics\classification.py in confusion_matrix(y_true, y_pred, labels, sample_weight)
260 labels = np.asarray(labels)
261 if np.all([l not in y_true for l in labels]):
--> 262 raise ValueError("At least one label specified must be in y_true")
263
264 if sample_weight is None:
ValueError: At least one label specified must be in y_true
Note My Target Variable is already of type float and in the forms of 1's and 0's so there wasn't any labelEncoding required in it. DataSet File attached with this DataSet.txt
@haseebkhan1421 - thank you for your question and for using Yellowbrick! There is potentially one of two errors happening here. First - are you using the latest version of Yellowbrick (v0.9)? If not please pip install -U yellowbrick
, and you can then use the solution as discussed above:
cm = ConfusionMatrix(neighbor, classes=['0','1'], label_encoder={'0': 0, '1': 1})
cm.fit(FeatureData_Train,TargetData_Train)
cm.score(FeatureData_Test,TargetData_Test)
Note that classes
is intended to give the figure nice class labels, you could also just omit this, e.g. ConfusionMatrix(neighbor)
- does that work? Otherwise, you have to specify the label_encoder
(above as adict
) in order to map the string labels to the value labels (you mentioned they're type float
, generally the target should be type int
).
The second error is that you're actually in the situation that scikit-learn is warning about. This error occurs in scikit-learn if one of the classes is not represented in TargetData_Test
. Usually, this is because the data is ordered and the train_test_split
is not shuffling the data, or because there is a class balance issue.
My first suggestion would be to determine the class balance in your training data:
from yellowbrick.target import ClassBalance
oz = ClassBalance()
oz.fit(TargetData_Train, TargetData_Test)
oz.poof()
If one of the classes is missing in either the train or test splits, then this is where the error is occurring. You should be able to fix the problem by shuffling your data or using StratifiedKFolds.
Solution is posted on Stack Overflow:
Describe the issue
I am new to YellowBrick but am enjoying it so far - it;s been really great and easy to pick up. I have a question about generating a Confusion Matrix.
I am using the built-in
game
dataset from the learning curve doc example for Classification and I am trying to generate a confusion matrix. I am using the same code from the Confusion Matrix example docs here:The
y
variable is a column of strings and soclasses
is a list of strings. I am usingLabelEncoder
from scikit-learn to convert this listclasses
to a list of integers (the new list is also namedclasses
). This is similar to theConfusionMatrix
documentation example whereclasses=[0,1,2,3,4,5,6,7,8,9]
.I then pass the list of integers to theConfusionMatrix
visualizer.When I run the above code, I get this error message
I see that it is ignoring the list of integer classes I provided. The Confusion Matrix example dataset runs fine with no error (also using a list of integers).
Do I need to provide another input in order to overcome this error?
Here's the details about packages