Test and Score Accuracy Average over class

[ ] What's wrong?
Im doing a classification on images using two datasets for training and testing. Im selecting the option test on test data.

Im looking for the average accuracy of my model (KNN or neural network) over all classes (24 classes in my datasets). In target class when I select "Average over classes" I have an accuracy of 0.797. However when I look for the accuracy of each classes, they are all higher than 0.95. The average accuracy of the classification should be the same but The test and score widget return a CA of 0,797

[ ] How can we reproduce the problem?
Im doing a classification on images and i would like to use the widget "test and score" with the option "Test on test data". I used select columns to select the feature "category" as target variable for the two datasets.

The training dataset is ordered in folders (so in class) with 1 image per folder. The feature "category is created by orange using the folder architecture. For the test dataset the widget "create feature" create the feature "class_name" using a substring of each image and then I create the target variable "category" using the widget "create class". Finaly the feature category is setted as target variable with "select columns" widget.

The two datasets are passed to the "test and score" widget with the option "test on test data"

Then look at the accuracy over all classes and accuracy for each classes (change the classes in target classes)

Screenshot of the ows file

Zip of the ows file (datasets are too big) save.zip

[ ] What's your environment?
Operating system:3.27.1
Orange version: windows 10
How you installed Orange: Offical website

This is OK. When you observe the accuracy per class, you are trying to predict whether a data instance belongs to some class or not. This is a much easier problem than predicting one of 24 possible classes. In the former case, if your target class is class A, you ignore mistaking B for C.

You can easily replicated this with, say, Zoo data set and Naive Bayes. Accuracies for individual classes are 98 - 100%, while the overall accuracy is 92 %. To see how this happens, connect Confusion Matrix to Test and Score. For class 'amphibians', it made only two mistakes (two animals were predicted as amphibians though they are a mammal and a reptile), hence CA is 99 / 101 = 98 %. For fish, it made three mistakes, so CA is 97 % ... and so forth. But in total there are 8 mistakes, hence the overal CA is 92 %. In other words, these 8 mistakes are spread over all classes, decreasing the CA of each just a little.

biolab / orange3

Test and Score Accuracy Average over class #5313