LAMDA-NJU / Deep-Forest

An Efficient, Scalable and Optimized Python Framework for Deep Forest (2021.2.1)
https://deep-forest.readthedocs.io
Other
908 stars 158 forks source link

How to plot roc curve ? #112

Closed simonprovost closed 2 years ago

simonprovost commented 2 years ago

Dear Sir or Madam,

I was hoping you might provide us some guidance on how to visualise the roc curve of a pre-trained Deep Forest model.

For instance:

from sklearn import metrics
from deepforest import CascadeForestClassifier

model = CascadeForestClassifier()
model.load(model_path)
metrics.plot_roc_curve(
                model,
                data_to_predict.drop(class_label, axis=1),
                data_to_predict[class_label],
            )

This did not work. Outputting the following:


[2022-07-11 20:23:44.093] Start to evalute the model:
[2022-07-11 20:23:44.096] Evaluating cascade layer = 0 
[2022-07-11 20:23:52.649] Evaluating cascade layer = 1 
Error: while building the roc curve
Traceback (most recent call last):
  File "/confidential.py", line 206, in plot_roc_curve
    metrics.plot_roc_curve(
  File "/python3.8/site-packages/sklearn/utils/deprecation.py", line 88, in wrapped
    return fun(*args, **kwargs)
  File "/python3.8/site-packages/sklearn/metrics/_plot/roc_curve.py", line 451, in plot_roc_curve
    y_pred, pos_label = _get_response(
  File "/python3.8/site-packages/sklearn/metrics/_plot/base.py", line 103, in _get_response
    pos_label = estimator.classes_[class_idx]
AttributeError: 'CascadeForestClassifier' object has no attribute 'classes_'
``
xuyxu commented 2 years ago

Hi, the classes_ attribute is not reserved in deep forest. To solve your problem, one possible solution is to train a random forest classifier on your dataset, and then ingest its classes_ attribute into the deep forest before calling plot_roc_curve, i.e., model.classes_ = rf.classes_.

simonprovost commented 2 years ago

Amazing @xuyxu ! Thanks for the advice. It's implement ✅

Nevertheless, may I ask why you have not allocated classes_ for the final model? It is odd that, for instance, the AUROC curve is no longer showable. There might be technical reasons for that.

I am closing the issue anyway 🚀

xuyxu commented 2 years ago

For improving the memory efficiency and inference speed, RandomForestClassifer in deep forest is implemented using arguablely the most compact data structure: four numpy arraies representing splitting feature, threshold, child node index, and prediction from leaf nodes, separately. Therefore, there is no room for another attribute classes_.