Feature Request: Extend Confusion Matrix Support in to Single-Label Classification

aboccag commented 3 months ago

Description:

I would like to request an enhancement to the existing AccuracyScore metric in the PaddleClas framework to fully support single-label classification tasks. Currently, the AccuracyScore metric, which is part of the MultilabelMetric class, includes an implementation of the confusion matrix. This implementation works well for multi-label classification tasks but does not function correctly for single-label classification scenarios.

When attempting to use the AccuracyScore metric with a single-label classification model, the following error is encountered:

ValueError: Classification metrics can't handle a mix of multiclass and multilabel-indicator targets

This error occurs because the current implementation is tailored for multi-label scenarios, leading to issues when applied to single-label tasks.

Use Case:

This feature is important for users who:

Are working with single-label classification models and want to leverage the confusion matrix to better understand their model's performance.
Need to analyze the distribution of true positives, false positives, true negatives, and false negatives in a single-label classification context.
Require consistent metric functionality across both single-label and multi-label classification tasks.

Current Limitation:

The AccuracyScore metric is designed for multi-label classification and works correctly in that context with the following configuration:

Metric:
  Train:
    - AccuracyScore:
  Eval:
    - AccuracyScore:

However, when this metric is applied to single-label classification models, it results in the error mentioned above. This limits its utility in typical classification tasks where only one label is assigned to each instance.

Expected Enhancement:

The proposed enhancement should:

Extend the functionality of the AccuracyScore metric to properly handle single-label classification tasks.
Allow for seamless integration into the existing PaddleClas metric framework without requiring significant changes to user configurations.

Additional Context on Confusion Matrix

Here’s an example usage of the confusion matrix from scikit-learn: (from this source)

from sklearn.metrics import confusion_matrix

y_true = ["cat", "ant", "cat", "cat", "ant", "bird"]
y_pred = ["ant", "ant", "cat", "cat", "ant", "cat"]
cm = confusion_matrix(y_true, y_pred, labels=["ant", "bird", "cat"])

This function computes a confusion matrix to evaluate the accuracy of a classification. The matrix indicates how many samples were correctly or incorrectly classified across each category.

Parameters:

y_true: Ground truth (correct) target values.
y_pred: Estimated targets as returned by a classifier.
labels: List of labels to index the matrix. This may be used to reorder or select a subset of labels.
sample_weight: Sample weights.
normalize: Normalizes confusion matrix over the true (rows), predicted (columns) conditions, or all the population.

Returns:

C: Confusion matrix whose i-th row and j-th column entry indicates the number of samples with true label being i-th class and predicted label being j-th class.

Example Usage:

After the enhancement, users should be able to use the AccuracyScore metric with single-label classification models in the same way as with multi-label models:

Metric:
  Train:
    - AccuracyScore:
  Eval:
    - AccuracyScore:

This configuration should work correctly for both single-label and multi-label classification tasks.

liuhongen1234567 commented 3 months ago

Hello, multilabel_confusion_matrix in AccuracyScore is designed for mutilabel class task, if you want to calculate confusion matrix
, please use confusion_matrix instead of multilabel_confusion_matrix.

liuhongen1234567 commented 3 months ago

You can modify this code block in PaddleClas-develop/PaddleClas-develop/ppcls/metric/metrics.py according to sklearn document

aboccag commented 3 months ago

Many thanks!

PaddlePaddle / PaddleClas