Datastructure for classification report

evidentlyai / evidently

Evidently is an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.

https://www.evidentlyai.com/evidently-oss

Apache License 2.0

5.37k stars 595 forks source link

Datastructure for classification report #936

Open RobbStarkAustria opened 10 months ago

RobbStarkAustria commented 10 months ago

hello,

I would like to use evidently to analyze my model results. I use YoloV8 for object detection. During detection, there are of course true positives, false positives and false negatives.

I enter my results in two columns of the pandas-dataframe. In the column "target" the ground-truth and in the column "pred" the symbol class predicted at this bounding box. There is a value for false positives, but none for false negatives. This results in an error message in the ClassificationConfusionMatrixRenderer class: curr_matrix.labels = [target_names[x] for x in curr_matrix.labels] KeyError: ''

'' is the value in the pred column for false negatives.

With which value do I have to enter false negatives in the pred column so that the error does not occur?

Many thanks for the answer.

Kind regards

Robb

elenasamuylova commented 10 months ago

Hi @RobbStarkAustria,

Are you using column mapping to specify the name of the prediction column?

If not, check out these docs https://docs.evidentlyai.com/user-guide/input-data/column-mapping#prediction-column-s-in-classification or an example notebook for different options on how to map your input data: https://github.com/elenasamuylova/evidently/blob/main/examples/how_to_questions/how_to_use_column_mapping.ipynb

If the issue persists, could you share a small reproducible code example?

RobbStarkAustria commented 10 months ago

Hi @elenasamuylova,

Thank you for your quick answer.

I have read the suggested pages. I am not sure if I have understood everything correctly or if I am using the wrong data structure.

So I'll send you a csv file of my dataframe and the code I used. Maybe you can recognize my misunderstanding.

    data = {"target": target, "pred": pred}
    evidently_df = pd.DataFrame(data)
    column_mapping = ColumnMapping()
    column_mapping.target = 'target'
    column_mapping.prediction = 'pred'
    column_mapping.target_names = cl_list
    column_mapping.task = 'classification'

    classification_report = Report(metrics=[
        ClassificationConfusionMatrix(),
        ClassificationQualityByClass(),
        ClassificationQualityMetric(),
    ])

    classification_report.run(reference_data=None, current_data=evidently_df, column_mapping=column_mapping)
    classification_report.save_html("report.html")

Index 0-225: true positives Index 226-228: false positives Index 229-230: false negatives evidently_issue.csv cl_list.csv