[Bug]: Support metric is reported incorrectly for multi-class classifier model evaluation

Describe the bug

The classification report returned from running evaluate for a classifier model reports accuracy instead of support when it is not a multi-label classifier

To Reproduce

import flair

result = trainer.model.evaluate(test_set, gold_label_type=label_type)
print(result.classification_report['micro avg']['support'])

Expected behavior

Support should report the integer number of samples

Logs and Stack traces

No response

Screenshots

No response

Additional Context

No response

Environment

Versions:

Flair

0.13.1

Pytorch

2.3.1+cu121

Transformers

4.31.0

GPU

True

flairNLP / flair

[Bug]: Support metric is reported incorrectly for multi-class classifier model evaluation #3509

Describe the bug

To Reproduce

Expected behavior

Logs and Stack traces

Screenshots

Additional Context

Environment

Versions:

Flair

Pytorch

Transformers

GPU