facebookresearch / SentEval

A python tool for evaluating the quality of sentence embeddings.
Other
2.09k stars 309 forks source link

Wrong usage of Label encoding for STSBenchmark task #83

Open marco-digio opened 3 years ago

marco-digio commented 3 years ago

The encode_labels() function in senteval/sick.py from this article is meant for labels [1, ..., K] (see section 4.2 of the paper). STSBenchmarkEval class inherits SICKRelatednessEval, so it inherits its encode_labels() function. However, STSBenchmark task has labels from 0 to 5. Thus by constructions, a model trained in this way will never predict correctly data with label in [0, 1].

It is easy to check this issue by running examples/bow.py on STSBenchmark task and printing min(results['STSBenchmark']['yhat']) that will be always greater than 1!

An easy way to fix this could be shifting the original labels in senteval/sts.py: sick_data['y'] = [float(s)+1 for s in sick_data['y']] and then fix the ranges in the rest of the code. However, that will probably mess the code for SICK task, that is currently correct.