Unbabel / COMET

A Neural Framework for MT Evaluation
https://unbabel.github.io/COMET/html/index.html
Apache License 2.0
441 stars 72 forks source link

[QUESTION] Reliability for en-th and th-en pair #180

Closed zzzzzzzzzzzzz closed 6 months ago

zzzzzzzzzzzzz commented 8 months ago

Hello,

I am interested in the reliability of Unbabel/wmt-22 model for en-th and th-en language pair (th - Thia). I went through several papers in the description, but it seems that Thai language was never selected specifically in any competition for direct evaluation. Though, it's written that this language is supported by the models, my question is the following:

What quality (kendall tau correlation, for instance) we might expect within these language pairs? Can we even state anything without explicit experiments?

I was thinking about referring to the similar language pairs, but it seems like there were no representatives from Tai-Kadai family or adjacent languages such as Burmese or Khmer. Correct me if I am wrong and thank you for your help.

ricardorei commented 7 months ago

Hi @zzzzzzzzzzzzz, Unfortunately without testing we can't say much about the performance of COMET on Thia. Just yesterday we added a paper where we tested COMET on african languages which was never done before. the wmt22-comet-da model without any fine-tuning on those languages, was not bad. I expect that for Thia, because its supported by the underlying encoder, we would get similar results...

You can read the paper here

zzzzzzzzzzzzz commented 6 months ago

Hello @ricardorei, thank you for your response!

Hope that the direct evaluation will appear in the future.

For now, I believe that at least COMET is valuable for sorting different approaches, since I believe the correlation will still be higher than for BLEU-like metrics.