Evaluation metrics not working for multi-label streams [BUG]

I run the following command:

EvaluatePrequentialMultiLabel -l (multitarget.BasicMultiLabelClassifier -l multilabel.MultilabelHoeffdingTree) -s (generators.multilabel.MultilabelArffFileStream -l 20 -f /tmp/datasets/20ng/meka/20NG-F.arff) -f 868 -q 868 -w 868

And the output goes as follows (only first three rows):

learning evaluation instances	evaluation time (cpu seconds)	Exact Match	Hamming Score
868.0	0.509275225	0.9665513264129181	0.9665513264129181
1736.0	1.075152357	0.9648414985590779	0.9648414985590779
2604.0	1.48700487	0.9665770265078756	0.9665770265078756

From this output you can see some issues:

No values are returned for the metrics of Accuracy, Precision, Recall and F-Measure. All of them have a value of “0.0” for every window of samples evaluated.
Exact Match and Hamming Score have equals values.
The value for Exact match is too high, since it forces the prediction to have exactly the same labels as the testing sample.

This behavior seems to be present in every multi-label stream and multi-label model I tested.

Thanks in advanced.

Waikato / moa

Evaluation metrics not working for multi-label streams [BUG] #211