Wrong usage of Label encoding for STSBenchmark task

The encode_labels() function in senteval/sick.py from this article is meant for labels [1, ..., K] (see section 4.2 of the paper). STSBenchmarkEval class inherits SICKRelatednessEval, so it inherits its encode_labels() function. However, STSBenchmark task has labels from 0 to 5. Thus by constructions, a model trained in this way will never predict correctly data with label in [0, 1].

It is easy to check this issue by running examples/bow.py on STSBenchmark task and printing min(results['STSBenchmark']['yhat']) that will be always greater than 1!

An easy way to fix this could be shifting the original labels in senteval/sts.py: sick_data['y'] = [float(s)+1 for s in sick_data['y']] and then fix the ranges in the rest of the code. However, that will probably mess the code for SICK task, that is currently correct.

facebookresearch / SentEval

Wrong usage of Label encoding for STSBenchmark task #83