Unbabel / COMET

A Neural Framework for MT Evaluation
https://unbabel.github.io/COMET/html/index.html
Apache License 2.0
493 stars 76 forks source link

Is `wmt22-comet-da` the same as "COMET-22" & trained on MQM data? #163

Closed juliafalcao closed 1 year ago

juliafalcao commented 1 year ago

Rei et al. 2022 proposed "COMET-22" as a new model which is "an ensemble between a COMET estimator model trained with DA and a newly proposed multitask model trained to predict sentence-level scores along with OK/BAD word-level tags derived from MQM error annotations." The COMET documentation, however, lists wmt22-comet-da as the current default model, and says it's been trained only on DA data from WMT 2017-2020, matching with what is listed in its hparams.yaml file.

So I just wanted to clarify, regarding wmt22-comet-da, the version that is available on HuggingFace Hub: is it a regular COMET-DA model trained only on WMT17-20 DA data, or was it fine-tuned on MQM scores as well?

ricardorei commented 1 year ago

Yep this is a bit confusing. The name choice was not the best... The wmt22-comet-da model is the DA model we used in that ensemble. The multitask model is actually not available atm.

We are working on a new and improved metric that will closely follow that multitask model described there and I hope to release that soon.

Giving a bit more context, that metric seemed to work well for very high quality MT and specially for zh-en, en-de and en-ru but was blind to really bad quality MT and correlations were not good outside those languages.

juliafalcao commented 1 year ago

Thank you for clarifying!