Unbabel / COMET

A Neural Framework for MT Evaluation
https://unbabel.github.io/COMET/html/index.html
Apache License 2.0
493 stars 76 forks source link

UnifiedMetric test failing: test_multitask_with_references #132

Closed BramVanroy closed 1 year ago

BramVanroy commented 1 year ago

🐛 Bug

Was working on some test cases and ran the test suite and found that one of the tests fail. Specifically test_unified_metric.test_multitask_with_references

https://github.com/Unbabel/COMET/blob/db918c6149c771509adcb427e1cf1c6ca94fd926/tests/integration/models/test_unified_metric.py#L159

Traceback (most recent call last):
  File "/home/local/vanroy/COMET/tests/integration/models/test_unified_metric.py", line 261, in test_multitask_with_references
    self.assertListEqual(word_level_example, subword_scores_example)
AssertionError: Lists differ: [0, 0[31 chars] 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0] != [0, 0[31 chars] 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0]

First differing element 18:
0
1

  [0,
-  0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   1,
   1,
   1,
   1,
   1,
   1,
-  0,
+  1,
+  1,
   0,
   0,
   0,
   0,
   0,
   0,
   0]

To Reproduce

I simply ran the unittests in a fresh environment. Maybe the difference is caused by different torch/python versions?

Environment

ricardorei commented 1 year ago

All tests seem to be passing on the current version BUT the word-level implementation is not optimal. We have been working on a new metric that uses word-level tagging and hopefully soon we will merge better code to deal with word-level