Closed lambda-science closed 3 years ago
Hi @Aperture77 - thank you for using Yellowbrick! Sorry you have been having trouble with the PrecisionRecallCurve visualizer. In looking at the error message, it appears that the model is getting fitted again and it is possible that the error is occurring because of this. Since you already fitted the model, pass in is_fitted=True
to the visualizer and hopefully this will resolve your issue.
Hi @Aperture77 - thank you for using Yellowbrick! Sorry you have been having trouble with the PrecisionRecallCurve visualizer. In looking at the error message, it appears that the model is getting fitted again and it is possible that the error is occurring because of this. Since you already fitted the model, pass in
is_fitted=True
to the visualizer and hopefully this will resolve your issue.
Hello thanks for the answer. Here is the traceback with is_fitted=True.
---------------------------------------------------------------------------
NotFittedError Traceback (most recent call last)
/enadisk/maison/genomics18/xxxx/code-project/scikit_ML_Pipeline_Binary_Notebook/modeling_methods.py in run_XGB_full(x_train, y_train, x_test, y_test, randSeed, i, param_grid, name_path, hype_cv, n_trials, scoring_metric, timeout, wd_path, output_folder, algorithm, data_name, type_average)
1518 viz = PrecisionRecallCurve(model, classes=classes, is_fitted=True)
1519 viz.fit(x_train, y_train)
-> 1520 viz.score(x_test, y_test)
1521 prec = viz.precision_["micro"]
1522 recall = viz.recall_["micro"]
/enadisk/maison/genomics18/xxxx/code-project/scikit_ML_Pipeline_Binary_Notebook/prcurve.py in score(self, X, y)
312 # Call super to check if fitted and to compute classes_
313 # Note that self.score_ computed in super will be overridden below
--> 314 super(PrecisionRecallCurve, self).score(X, y)
315
316 # Compute the prediction/threshold scores
~/anaconda3/envs/ML-pipeline/lib/python3.9/site-packages/yellowbrick/classifier/base.py in score(self, X, y)
236
237 # This method implements ScoreVisualizer (do not call super).
--> 238 self.score_ = self.estimator.score(X, y)
239 return self.score_
240
~/anaconda3/envs/ML-pipeline/lib/python3.9/site-packages/sklearn/base.py in score(self, X, y, sample_weight)
498 """
499 from .metrics import accuracy_score
--> 500 return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
501
502 def _more_tags(self):
~/anaconda3/envs/ML-pipeline/lib/python3.9/site-packages/sklearn/multiclass.py in predict(self, X)
356 Predicted multi-class targets.
357 """
--> 358 check_is_fitted(self)
359
360 n_samples = _num_samples(X)
~/anaconda3/envs/ML-pipeline/lib/python3.9/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
~/anaconda3/envs/ML-pipeline/lib/python3.9/site-packages/sklearn/utils/validation.py in check_is_fitted(estimator, attributes, msg, all_or_any)
1096
1097 if not attrs:
-> 1098 raise NotFittedError(msg % {'name': type(estimator).__name__})
1099
1100
NotFittedError: This OneVsRestClassifier instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.
With is_fitted=True
, the PrecisionRecallCurve is trying to warp the XGBoostClassifier with a OneVsRestClassifier that is not fitted.
From the .fit() doc:
Fit the classification model; if ``y`` is multi-class, then the estimator is adapted with a ``OneVsRestClassifier`` strategy, otherwise the estimator is fit directly.
My classification is multiclass so it is expected. If I skip the .fit() method and directly score:
--------------------------------------------------------------------------- NotFitted Traceback (most recent call last) /enadisk/maison/genomics18/xxxx/code-project/scikit_ML_Pipeline_Binary_Notebook/modeling_methods.py in run_XGB_full(x_train, y_train, x_test, y_test, randSeed, i, param_grid, name_path, hype_cv, n_trials, scoring_metric, timeout, wd_path, output_folder, algorithm, data_name, type_average) 1518 viz = PrecisionRecallCurve(model, classes=classes, is_fitted=True) 1519 # viz.fit(x_train, y_train) -> 1520 viz.score(x_test, y_test) 1521 prec = viz.precision_["micro"] 1522 recall = viz.recall_["micro"]
/enadisk/maison/genomics18/xxxx/code-project/scikit_ML_Pipeline_Binary_Notebook/prcurve.py in score(self, X, y) 303 # has not correctly been fitted for multi-class targets. 304 if not hasattr(self, "targettype"): --> 305 raise NotFitted.from_estimator(self, "score") 306 307 # Must perform label binarization before calling super
NotFitted: this PrecisionRecallCurve instance is not fitted yet, please call fit with the appropriate arguments before using score
Because the "target_type_" that is instanciated in the fit() method, is not set.
We discovered that there is an issue with the is_fitted
method in the PRCurve visualizer as the estimator is being wrapped in a OneVsRestClassifier
which is then subsequently not being fitted. We have logged an issue for this and will be looking into it.
In the meantime, as the article you posted points out, the num_class
parameter is not automatically set as scikit-learn uses the cv
method for the OneVsRestClassifier
. You can update your code with the following:
# Train a XGBClassifieir with your data and parameters after optimization.
classes = np.unique(y_train)
est = xgb.XGBClassifier(num_class=len(classes))
Describe the bug When calling the .fit() method of the PrecisionRecallCurve class on a XGBoost Multiclass Classifier it raises an error:
XGBoostError: value 0 for Parameter num_class should be greater equal to 1 num_class: Number of output class in the multi-class classification.
To Reproduce
Dataset I use my own dataset and it is not the issue as it is working for 9+ other ML methods.
Expected behavior I expect num_class to be set automatically as it is supposed to be done when calling .fit() on a XGBClassifier.
Traceback Yellowbricks code for PRCruve is in prcurve.py as I tried to extract the code to work on it after the error, not successful.
Desktop (please complete the following information):
Additional context https://stackoverflow.com/questions/40116215/xgboost-sklearn-wrapper-value-0for-parameter-num-class-should-be-greater-equal-t As per the stackoverflow link, XGBoost is supposed to set automatically this parameter. This is not the case. I spend hours and hours trying to find a workaround, setting it by hand, before, but also in the .fit() method. Trying to skip the .fit() method as my Classifier is already trained...... Nothing works, I'm kind of depressed, have anyone used Yellowbrick PreRec Curve with XGboost ? Seem's weird that the AUC Curve does not throw any errors.