evidentlyai / evidently

Evidently is ​​an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.
https://www.evidentlyai.com/evidently-oss
Apache License 2.0
5.44k stars 602 forks source link

Unable to generate Report for Multi-class Classification with single output class #1275

Open yudhiesh opened 2 months ago

yudhiesh commented 2 months ago

I have simulated a data drift which results in the model predicting the same class over and over again, when I try to run the Report on the reference data and current data, here is the current code:

from evidently.pipeline.column_mapping import ColumnMapping

from evidently.report import Report
from evidently.metrics import ClassificationQualityMetric

column_mapping = ColumnMapping()

column_mapping.target = 'label'
column_mapping.prediction = ['prob_NEGATIVE', 'prob_NEUTRAL', 'prob_POSITIVE']
column_mapping.text_features = ['text']
column_mapping.numerical_features = []
column_mapping.task = 'classification'
column_mapping.categorical_features = []

performance_report = Report(metrics=[
    ClassificationQualityMetric()
])

performance_report.run(reference_data=test_df, current_data=data_drift_df, column_mapping=column_mapping)
performance_report.show()

Here is the current/reference data example:

text | label | prob_NEGATIVE | prob_NEUTRAL | prob_POSITIVE | predicted_label | predicted_sentiment
« C’est de loin la méthode de contraception la... | 0 | 0.219654 | 0.071736 | 0.708610 | 2 | POSITIVE
« Je prends de la doxy depuis un certain temps... | 0 | 0.307037 | 0.108540 | 0.584423 | 2 | POSITIVE
« En 8 heures de prise d'un comprimé, j'ai eu ... | 0 | 0.159101 | 0.039321 | 0.801578 | 2 | POSITIVE
« Cela a changé ma vie. Je peux travailler eff... | 2 | 0.172600 | 0.040159 | 0.787241 | 2 | POSITIVE
« Cela a changé ma vie. L’anxiété a disparu, e... | 2 | 0.172715 | 0.037171 | 0.790113 | 2 | POSITIVE

I get the following error, which stems from scikit-learn:

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

[<ipython-input-57-7c6b02163273>](https://localhost:8080/#) in <cell line: 1>()
----> 1 performance_report.show()

13 frames

[/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py](https://localhost:8080/#) in confusion_matrix(y_true, y_pred, labels, sample_weight, normalize)
    338             return np.zeros((n_labels, n_labels), dtype=int)
    339         elif len(np.intersect1d(y_true, labels)) == 0:
--> 340             raise ValueError("At least one label specified must be in y_true")
    341 
    342     if sample_weight is None:

ValueError: At least one label specified must be in y_true

It seems that the labels are not getting propogated down to calculate metrics that leverage probabilities such as ROC_AUC as per this Stackoverflow thread. Noticed a similar issue before that was fixed.

I am currently using evidently==0.4.19.