DistrictDataLabs / yellowbrick

Visual analysis and diagnostic tools to facilitate machine learning model selection.
http://www.scikit-yb.org/
Apache License 2.0
4.29k stars 559 forks source link

Not supporting model of Catboost #1099

Closed p-suresh-kumar closed 4 years ago

p-suresh-kumar commented 4 years ago

model_Cat = CatBoostClassifier() visualizer = ClassificationReport(model_Cat, classes=[0,1], support=True)

It is showing an error "YellowbrickTypeError: Cannot detect the model name for non estimator: '<class 'catboost.core.CatBoostClassifier'>'"

bbengfort commented 4 years ago

Hello @p-suresh-kumar - thank you for using Yellowbrick. Right now Yellowbrick does not necessarily support non-scikit-learn estimators. We think it's important to do so, but it's difficult to figure out how we can best support this since Yellowbrick relies on many sklearn attributes and methods. You can see similar issues such as #306 #397 #1066 and #1098.

If you would like to use catboost I propose that you create a wrapper estimator in yellowbrick.contrib similar to the one we made for statsmodels. I will also look at creating a generic wrapper that might assist you for v1.2.

Just out of curiosity, what happens when you do something like this:

from sklearn.base import BaseEstimator

class CatBoostWrapper(CatBoostClassifier, BaseEstimator):
    pass

model_Cat = CatBoostWrapper()
visualizer = ClassificationReport(model_Cat, classes=[0,1], support=True)
visualizer.fit(X_train, y_train)
visualizer.score(X_test, y_test)
visualizer.show()
bbengfort commented 4 years ago

@p-suresh-kumar In #1103 - I created some wrapper functionality as described above in the yellowbrick.contrib module and tested it with CatBoost. It should therefore be pretty simple to use the CatBoostClassifier in v1.2.

p-suresh-kumar commented 4 years ago

Traceback (most recent call last):

File "", line 1, in runfile('F:/My research/Research/Code Works/Diabetes/cat_boost_alone.py', wdir='F:/My research/Research/Code Works/Diabetes')

File "C:\Users\Evelyn\anaconda3\envs\venv\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile execfile(filename, namespace)

File "C:\Users\Evelyn\anaconda3\envs\venv\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

File "F:/My research/Research/Code Works/Diabetes/cat_boost_alone.py", line 86, in visualizer.fit(X_train, y_train)

File "C:\Users\Evelyn\anaconda3\envs\venv\lib\site-packages\yellowbrick\classifier\base.py", line 187, in fit self.classes_ = self._decodelabels(self.classes)

File "C:\Users\Evelyn\anaconda3\envs\venv\lib\site-packages\yellowbrick\classifier\base.py", line 284, in _decode_labels idx = LabelEncoder().fit_transform(yp)

File "C:\Users\Evelyn\anaconda3\envs\venv\lib\site-packages\sklearn\preprocessing_label.py", line 255, in fit_transform y = column_or_1d(y, warn=True)

File "C:\Users\Evelyn\anaconda3\envs\venv\lib\site-packages\sklearn\utils\validation.py", line 72, in inner_f return f(**kwargs)

File "C:\Users\Evelyn\anaconda3\envs\venv\lib\site-packages\sklearn\utils\validation.py", line 847, in column_or_1d "got an array of shape {} instead.".format(shape))

ValueError: y should be a 1d array, got an array of shape () instead.

sorenwacker commented 2 years ago

I get this error with Catboost and pycaret too.


from pycaret.classification import *
import shap
shap.initjs()

experiment = setup(df.filter(regex='G__*|DEATH_IND'), 'DEATH_IND', silent=True, 
                   feature_selection=True, remove_multicollinearity=True)

model = create_model('catboost')

from yellowbrick.classifier import ROCAUC

X_train, X_test, y_train, y_test = get_config('X_train'), get_config('X_test'), get_config('y_train'), get_config('y_test')

visualizer = ROCAUC(model, classes=["dead", "alive"]) # regresión lineal
visualizer.fit(X_train, y_train) # Fit the training data to the visualizer
visualizer.score(X_test, y_test) # Evaluate the model on the test data

YellowbrickTypeError: Cannot detect the model name for non estimator: '<class 'catboost.core.CatBoostClassifier'>'

Is there a workaround?

bbengfort commented 2 years ago

Hello @sorenwacker thank you for using Yellowbrick! Have you tried the third-party estimator wrapper? E.g.

from pycaret.classification import *
import shap
shap.initjs()

experiment = setup(df.filter(regex='G__*|DEATH_IND'), 'DEATH_IND', silent=True, 
                   feature_selection=True, remove_multicollinearity=True)

from yellowbrick.contrib.wrapper import classifier as wrap_classifier

model = create_model('catboost')

from yellowbrick.classifier import ROCAUC

X_train, X_test, y_train, y_test = get_config('X_train'), get_config('X_test'), get_config('y_train'), get_config('y_test')

visualizer = ROCAUC(wrap_classifier(model), classes=["dead", "alive"]) # regresión lineal
visualizer.fit(X_train, y_train) # Fit the training data to the visualizer
visualizer.score(X_test, y_test) # Evaluate the model on the test data
sorenwacker commented 2 years ago

https://github.com/pycaret/pycaret/issues/1843

sorenwacker commented 2 years ago

Yes that works!

sorenwacker commented 2 years ago

But all the labels and legend are gone.

bbengfort commented 2 years ago

are you calling visualizer.show() or visualizer.finalize()?

sorenwacker commented 2 years ago

I was just using .score() and the plot was already showing up. Using .finalize() added the labels etc. !

enheng121108 commented 1 year ago
model_Cat = CatBoostWrapper()

Hi, I also meet the same problem in Catboostregressor. Does it work?

bbengfort commented 1 year ago

@enheng121108 I'm not sure what error you're referring to - there are several different errors and wrappers discussed in this thread. Please include the stack trace of the error, the version of Yellowbrick, Python, and OS you're using.

As far as I know the catboost wrapper works with catboost - but this is a contrib package and we do not guarantee support for third-party classifiers.

Given the snippet you added above, the CatBoostWrapper needs to wrap a catboost model, so it seems like you're not quite far enough in your machine learning process to effectively use the wrapper.