FBetaMeasure metric with one value per key

Is your feature request related to a problem? Please describe. FBetaMeasure metric (fbeta) returns a dictionary with three keys: precision, recall and fscore. Under each key, a list comprises the corresponding values for each class/label. This is problematic for some logging plugins (TensorBoard and Weight&Bias, for instance) because these plugins assume that each metric key comprises one unique value. In fact, W&B can work with lists, but it is usually less convenient (it is harder to choose a specific metric to plot, for instance).

Another problem is that you need to choose between having the individual values for each class or the average, but not both. If you choose to have the average, the per-class values are not returned.

Describe the solution you'd like I have implemented a class called FBetaMeasure2 that solve this by returning a dictionary with keys:

<class>-precision : `float`
<class>-recall : `float`
<class>-fscore : `float`
<avg>-precision : `float`
<avg>-recall : `float`
<avg>-fscore : `float`

where <class> is the index (or the label) of the class and <avg> is the (optional) requested average (micro, macro or weighted). You can even request more than one average.

This implementation just overrides the init(...) and get_metric(...) methods. The call(...) method is the same because it provides all the necessary counts. It is just the output of get_metric(...) that is not convenient in some cases.

Describe alternatives you've considered None.

Additional context This problem occurred to me when I was implementing a solution for issue #4619

allenai / allennlp

FBetaMeasure metric with one value per key #5637