Bug: Cannot save html classification report when target column and possible labels do not match.

With this csv:

target	cat	dog	giraffe
cat	0.8	0.1	0.1
dog	0.3	0.3	0.4

when i build the multiclass classification using:

import pandas as pd
from evidently import ColumnMapping
from evidently.metric_preset import ClassificationPreset
from evidently.report import Report 

df = pd.read_csv("animals.csv")
column_mapping = ColumnMapping()
column_mapping.target = "target"
column_mapping.prediction = list(df.loc[:, df.columns != "target"])
classification_performance_report = Report(metrics=[ClassificationPreset()])
classification_performance_report.run(current_data=df, reference_data=None, column_mapping=column_mapping)

I got, correctly, some warning of this kind:

UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.

then the report is generated correctly, but when i try to save it in html format for using it in my streamlit app:

classification_performance_report.save_html("report.html")

i get this error:

Traceback (most recent call last):
  File "/evidently-report/utils/csv2report.py", line 32, in <module>
    classification_performance_report.save_html(report_filepath)
  File "/evidently-report/venv/lib/python3.11/site-packages/evidently/suite/base_suite.py", line 207, in save_html
    dashboard_id, dashboard_info, graphs = self._build_dashboard_info()
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/evidently-report/venv/lib/python3.11/site-packages/evidently/report/report.py", line 212, in _build_dashboard_info
    html_info = renderer.render_html(test)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/evidently-report/venv/lib/python3.11/site-packages/evidently/metrics/classification_performance/classification_quality_metric.py", line 73, in render_html
    metric_result = obj.get_result()
                    ^^^^^^^^^^^^^^^^
  File "/evidently-report/venv/lib/python3.11/site-packages/evidently/base_metric.py", line 232, in get_result
    raise result.exception
  File "/evidently-report/venv/lib/python3.11/site-packages/evidently/calculation_engine/engine.py", line 42, in execute_metrics
    calculations[metric] = calculation.calculate(context, converted_data)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/evidently-report/venv/lib/python3.11/site-packages/evidently/calculation_engine/python_engine.py", line 88, in calculate
    return self.metric.calculate(data)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/evidently-report/venv/lib/python3.11/site-packages/evidently/metrics/classification_performance/classification_quality_metric.py", line 45, in calculate
    current = calculate_metrics(
              ^^^^^^^^^^^^^^^^^^
  File "/evidently-report/venv/lib/python3.11/site-packages/evidently/calculations/classification_performance.py", line 382, in calculate_metrics
    roc_auc = metrics.roc_auc_score(binaraized_target, prediction_probas_array, average="macro")
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/evidently-report/venv/lib/python3.11/site-packages/sklearn/metrics/_ranking.py", line 580, in roc_auc_score
    return _average_binary_score(
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/evidently-report/venv/lib/python3.11/site-packages/sklearn/metrics/_base.py", line 118, in _average_binary_score
    score[c] = binary_metric(y_true_c, y_score_c, sample_weight=score_weight)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/evidently-report/venv/lib/python3.11/site-packages/sklearn/metrics/_ranking.py", line 339, in _binary_roc_auc_score
    raise ValueError(
ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.

It would be better if this scenario was handled by setting ROC AUC score for that class equal to 0 (or 1).

I have the same problem. I did a bit of investigating, thinking we could work together to resolve the issue, but the solution to this problem did not seem very straightforward to me.

The main issue is that, when some labels are missing from the set of predictions in the .csv file, certain metrics become meaningless, and when evidently (via sklearn) tries to calculate them using scikit-learn results in errors.

Philosophically, I believe that this shouldn't happen because, even if according to the law of large numbers, it should be very rare for labels to be missing from large samples, a tool like Evidently should be capable of handling scenarios with missing labels, which can occur quite frequently, both in testing/debugging scenarios and in standard tasks where it is common for a label to be significantly less prevalent (e.g., spam detection, anomaly detection, forgery detection).

Practically speaking, fixing this is not trivial. Ideally, the report should be generated without omitting plots where metric calculations fail. Instead, these plots should include placeholders for the missing labels. However, this is not easy to achieve, since the code heavily relies on scikit-learn's abstractions. Should we request Scikit-learn to modify the ROC AUC function to accommodate absent labels in predictions? This approach seems incorrect because the statistic itself becomes irrelevant from a statistical perspective. Therefore, the solution should come from a higher level, although integrating such a change elegantly with Evidently's use of Scikit-learn is challenging if it is the best approach at all.

We could force the set of labels to contain all of them, or put dummy data, and although this should work, is not a definitive solution.

I'd like to help, but I'm not sure on were to start. @emeli-dral, @mike0sv what do you think? Thanks in advance and great work on this project

evidentlyai / evidently

Bug: Cannot save html classification report when target column and possible labels do not match. #1070