Closed BramVanroy closed 1 year ago
All tests seem to be passing on the current version BUT the word-level implementation is not optimal. We have been working on a new metric that uses word-level tagging and hopefully soon we will merge better code to deal with word-level
🐛 Bug
Was working on some test cases and ran the test suite and found that one of the tests fail. Specifically test_unified_metric.test_multitask_with_references
https://github.com/Unbabel/COMET/blob/db918c6149c771509adcb427e1cf1c6ca94fd926/tests/integration/models/test_unified_metric.py#L159
To Reproduce
I simply ran the unittests in a fresh environment. Maybe the difference is caused by different torch/python versions?
Environment