Averaging precision and recall

When there are two class labels, precision and recall (and possibly other metrics) are reported in a non-intuitive way. There is no inherent positive class, so each class is considered to be the positive class and then the two metric values are averaged. This is especially confusing for workshop participants when we have a classifier that predicts all instances to be in the same class.

One solution may be to support indicating the positive class in the GUI or the dataset. Another would be to pick one class to be positive in a deterministic way, such as sorting the class labels.

gitter-lab / ml4bio

Averaging precision and recall #37