DistrictDataLabs / yellowbrick

Visual analysis and diagnostic tools to facilitate machine learning model selection.
http://www.scikit-yb.org/
Apache License 2.0
4.28k stars 557 forks source link

could not determine class_counts_ from previously fitted classifier #1284

Closed ggous closed 1 year ago

ggous commented 1 year ago

Describe the bug When running visualizer, I am receiving a warning :
yellowbrick/classifier/base.py:232: YellowbrickWarning: could not determine class_counts_ from previously fitted classifier

which results in fitting again the classifier.

To Reproduce

from yellowbrick.classifier import ClassificationReport, ConfusionMatrix
from sklearn import datasets
from sklearn.model_selection import train_test_split
import xgboost as xgb
import matplotlib.pyplot as plt

X, y = datasets.load_iris(return_X_y=True)

x_train, x_val, y_train, y_val = train_test_split(
                                                X,
                                                y,
                                                stratify=y,
                                                test_size = 0.2)

model = xgb.XGBClassifier(objective ='multi:softprob',
                                            num_class=3,
                                            use_label_encoder=False,
                                            enable_categorical=False,
                                            n_estimators=10) 

model.fit(x_train,
               y_train,
               early_stopping_rounds=10,
              eval_set=[(x_train, y_train), (x_val, y_val)])

fig, ax = plt.subplots()
visualizer = ClassificationReport(model,
                                                   is_fitted=True)
visualizer.score(x_val, y_val)
visualizer.show()

Dataset from sklearn import datasets

Expected behavior Since, we declare is_fitted=True, it should not fit again.

Traceback

/home/ggous/miniconda3/envs/sklearn/lib/python3.9/site-packages/yellowbrick/classifier/base.py:232: YellowbrickWarning: could not determine class_counts_ from previously fitted classifier
  warnings.warn(

Desktop (please complete the following information):

Additional context I think sometimes it takes too much time to fit again?

bbengfort commented 1 year ago

@ggous thank you for using Yellowbrick and for reporting the issue that you found to us! I hope that you're finding Yellowbrick useful.

In order to use the ClassificationReport the model needs the class_counts_ learned attribute. This appears in most scikit-learn classifiers. I believe the xgb package adds learned attributes if it understands it's in a scikit-learn context. I am not really sure why it doesn't have it when you fit the model and after -- I don't use the xgb package very often.

Could you try directly adding class_counts_ to the model before creating the visualizer to see if that helps things?

@lwgray do you have experience using xgb -- if so, perhaps you could comment on this issue?

ggous commented 1 year ago

Hi bbengfort.

I can't find an attribute of class_counts_ in xgb. I am not sure how to add it before the visualizer.

ggous commented 1 year ago

Also, is there a way for only saving the visualizer without showing it?

I am using visualizer.show(outpath='./file.png')

but I want only to save , not to display the result plot.

lwgray commented 1 year ago

This code address both concerns but @bbengfort maybe you have a better trick to stop showing the plot. I feel like we answered this question before.

from yellowbrick.classifier import ClassificationReport, ConfusionMatrix
from sklearn import datasets
from sklearn.model_selection import train_test_split
import xgboost as xgb
import matplotlib.pyplot as plt

X, y = datasets.load_iris(return_X_y=True)

x_train, x_val, y_train, y_val = train_test_split(X,y,stratify=y,test_size = 0.2)

model = xgb.XGBClassifier(objective ='multi:softprob',
                                            num_class=3,
                                            use_label_encoder=False,
                                            enable_categorical=False,
                                            n_estimators=10) 

model.fit(x_train,
          y_train,
          early_stopping_rounds=10,
          eval_set=[(x_train, y_train), (x_val, y_val)])

# Specify class counts on the model
model.class_counts_ = 3
fig, ax = plt.subplots()
visualizer = ClassificationReport(model, is_fitted=True)
visualizer.score(x_val, y_val)

# Clear Figure works but @bbengfort might have a better approach
visualizer.show('test.png', clear_figure=True);
bbengfort commented 1 year ago

@lwgray thank you for adding those suggestions!

@ggous if you're in a Jupyter notebook, this StackOverflow post has some suggestions for preventing the image from being rendered. Otherwise clear_figure as @lwgray mentioned is probably your best bet.