UnifiedMetric test failing: test_multitask_with_references

🐛 Bug

Was working on some test cases and ran the test suite and found that one of the tests fail. Specifically test_unified_metric.test_multitask_with_references

https://github.com/Unbabel/COMET/blob/db918c6149c771509adcb427e1cf1c6ca94fd926/tests/integration/models/test_unified_metric.py#L159

Traceback (most recent call last):
  File "/home/local/vanroy/COMET/tests/integration/models/test_unified_metric.py", line 261, in test_multitask_with_references
    self.assertListEqual(word_level_example, subword_scores_example)
AssertionError: Lists differ: [0, 0[31 chars] 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0] != [0, 0[31 chars] 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0]

First differing element 18:
0
1

  [0,
-  0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   1,
   1,
   1,
   1,
   1,
   1,
-  0,
+  1,
+  1,
   0,
   0,
   0,
   0,
   0,
   0,
   0]

To Reproduce

I simply ran the unittests in a fresh environment. Maybe the difference is caused by different torch/python versions?

Environment

Platform: Linux-5.14.0-162.6.1.el9_1.0.1.x86_64-x86_64-with-glibc2.34
Python version: 3.10.10
PyTorch version (GPU?): 2.0.1+cu117 (True)
pytorch-lightning==1.9.5

Unbabel / COMET

UnifiedMetric test failing: test_multitask_with_references #132

🐛 Bug

To Reproduce

Environment