GEM-benchmark / GEM-metrics

Automatic metrics for GEM tasks
https://gem-benchmark.com
MIT License
60 stars 20 forks source link

Tests failing #65

Closed asnota closed 2 years ago

asnota commented 2 years ago

Following the recent typo correction, I discovered that the tests are failing (https://github.com/asnota/GEM-metrics/runs/3977022201?check_suite_focus=true) with AttributeError: module 'sacrebleu' has no attribute 'TOKENIZERS' which is true for test_metric, test_metric_identical_pred_refandtest_metric_mismatched_pred_ref((tests.test_sari.TestSari)`) and 3 assertion errors.

ndaheim commented 2 years ago

Which sacrebleu version do you have installed? In 2.0.0 the tokenizers were moved. I am preparing a fix for this and will submit a pull request soon to close this one.

With assertion errors, do you mean the ones in the MSTTR tests? This is tracked in Issue #55.

asnota commented 2 years ago

I was pointing out the errors while merge attempt: https://github.com/asnota/GEM-metrics/runs/3977022201?check_suite_focus=true

For the assertion errors - indeed, there 2 of them whicha related to MSTTR, here is a log error:

`====================================================================== FAIL: test_metric_mismatched_pred_ref (tests.test_bleu.TestBleu) Tests for completely dissimilar predictions and references

Traceback (most recent call last): File "/home/runner/work/GEM-metrics/GEM-metrics/tests/utils.py", line 35, in assertDeepAlmostEqual assertDeepAlmostEqual( File "/home/runner/work/GEM-metrics/GEM-metrics/tests/utils.py", line 50, in assertDeepAlmostEqual raise exc File "/home/runner/work/GEM-metrics/GEM-metrics/tests/utils.py", line 24, in assertDeepAlmostEqual test_case.assertAlmostEqual(expected, actual, *args, **kwargs) AssertionError: 0.0 != 0.505 within 2 places (0.505 difference)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/runner/work/GEM-metrics/GEM-metrics/tests/test_referenced.py", line 42, in test_metric_mismatched_pred_ref self._run_test(TestData.references, TestData.reversed_predictions, self.true_results_mismatched_pred_ref) File "/home/runner/work/GEM-metrics/GEM-metrics/tests/test_referenced.py", line 26, in _run_test assertDeepAlmostEqual( File "/home/runner/work/GEM-metrics/GEM-metrics/tests/utils.py", line 50, in assertDeepAlmostEqual raise exc AssertionError: %('0.0 != 0.505 within 2 places (0.505 difference)',) TRACE: ROOT -> 'bleu'

====================================================================== FAIL: test_msttr_disjoint_tokens (tests.test_msttr.TestMSTTR) Tests for MSTTR with disjoint tokens and by varying the window size.

Traceback (most recent call last): File "/home/runner/work/GEM-metrics/GEM-metrics/tests/test_msttr.py", line 55, in test_msttr_disjoint_tokens self.assertAlmostEquals(calculated_metrics[f"msttr-{window_size}"], 1) AssertionError: 0.95455 != 1 within 7 places (0.04544999999999999 difference)

====================================================================== FAIL: test_msttr_identical_tokens (tests.test_msttr.TestMSTTR) Tests for MSTTR with identical tokens and by varying the window size.

Traceback (most recent call last): File "/home/runner/work/GEM-metrics/GEM-metrics/tests/test_msttr.py", line 72, in test_msttr_identical_tokens self.assertAlmostEqual( AssertionError: 0.39394 != 1.0 within 7 places (0.60606 difference) `

ndaheim commented 2 years ago

Oh I see that there is also one for BLEU, maybe this comes from the new sacrebleu version as well, I'll check that.

ndaheim commented 2 years ago

Since the BLEU errors are resolved with pull request #67 and MSTTR errors are tracked in #55 I'll close this one.