gabrielpreda / Support-Tickets-Classification

This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en
MIT License
5 stars 6 forks source link

Error when running LGB classifier #3

Closed vitalie-cracan closed 6 years ago

vitalie-cracan commented 6 years ago

I get the following error when classifier = "LGB":

Shape of dataset before removing classes with less then 1 rows: (48549, 9)
Number of classes before removing classes with less then 1 rows: 2
Shape of dataset after removing classes with less then 1 rows: (48549, 9)
Number of classes after removing classes with less then 1 rows: 2
Training LGB classifier
[LightGBM] [Fatal] Number of classes should be specified and greater than 1 for multiclass training
Traceback (most recent call last):
  File "2_train_and_eval_model.py", line 136, in <module>
    text_clf = text_clf.fit(train_data, train_labels)
  File "D:\c\bin\anaconda3\lib\site-packages\sklearn\pipeline.py", line 250, in fit
    self._final_estimator.fit(Xt, y, **fit_params)
  File "D:\c\bin\anaconda3\lib\site-packages\lightgbm\sklearn.py", line 695, in fit
    callbacks=callbacks)
  File "D:\c\bin\anaconda3\lib\site-packages\lightgbm\sklearn.py", line 474, in fit
    callbacks=callbacks)
  File "D:\c\bin\anaconda3\lib\site-packages\lightgbm\engine.py", line 183, in train
    booster = Booster(params=params, train_set=train_set)
  File "D:\c\bin\anaconda3\lib\site-packages\lightgbm\basic.py", line 1307, in __init__
    train_set.construct().handle,
  File "D:\c\bin\anaconda3\lib\site-packages\lightgbm\basic.py", line 860, in construct
    categorical_feature=self.categorical_feature, params=self.params)
  File "D:\c\bin\anaconda3\lib\site-packages\lightgbm\basic.py", line 710, in _lazy_init
    self.__init_from_csr(data, params_str, ref_dataset)
  File "D:\c\bin\anaconda3\lib\site-packages\lightgbm\basic.py", line 800, in __init_from_csr
    ctypes.byref(self.handle)))
  File "D:\c\bin\anaconda3\lib\site-packages\lightgbm\basic.py", line 49, in _safe_call
    raise LightGBMError(decode_string(_LIB.LGBM_GetLastError()))
lightgbm.basic.LightGBMError: Number of classes should be specified and greater than 1 for multiclass training
gabrielpreda commented 6 years ago

Q: the error is obtained when a non-binary classifier is requested, for example "category"?

vitalie-cracan commented 6 years ago

I get the error for column_to_predict = "ticket_type"

gabrielpreda commented 6 years ago

@en-vcracan This is the running log from my side:

runfile('C:/D/workspace/Support-Tickets-Classification/2_train_and_eval_model.py', wdir='C:/D/workspace/Support-Tickets-Classification') Shape of dataset before removing classes with less then 1 rows: (48549, 9) Number of classes before removing classes with less then 1 rows: 2 Shape of dataset after removing classes with less then 1 rows: (48549, 9) Number of classes after removing classes with less then 1 rows: 2 Training LGB classifier Evaluating model Confusion matrix without GridSearch: [[2583 179] [ 49 6899]] Mean without GridSearch: 0.976519052523

image

         precision    recall  f1-score   support

      0       0.98      0.94      0.96      2762
      1       0.97      0.99      0.98      6948

avg / total 0.98 0.98 0.98 9710

gabrielpreda commented 6 years ago

Also, the parameters are specifying that a multiclass type of classification is requested clf = LGBMClassifier( boosting_type='gbdt', objective='multiclass', learning_rate=0.01, colsample_bytree=0.9, subsample=0.8, random_state=1, n_estimators=100, num_leaves=31, silent=False)

@en-vcracan Could you please try if with adding this to parameters: eval_metric='multi-logloss', you still get the error ?

vitalie-cracan commented 6 years ago

Odd. Could you confirm the version of lightgbm on your machine? According to the docs to the latest version, num_class parameter is required for multiclass classifier.

https://lightgbm.readthedocs.io/en/latest/Parameters.html#core-parameters

vitalie-cracan commented 6 years ago

I confirm that I get same error when I add eval_metric='multi-logloss'

Addittionally, I get this warning:

[LightGBM] [Warning] Unknown parameter: eval_metric

gabrielpreda commented 6 years ago

I realized that I was using lightgbm 2.0.10 while the last one is 2.1.2. For the moment, until I fix the issue, please use (if available) lightgbm 2.0.10. Thank you for raising it.