Closed ZeroneBo closed 4 months ago
I think the COMET classifier operates on source/output/reference embeddings and there is no explicit penalty when a hypothesis is in an incorrect language. Maybe you could run language identification on the target side to force-set sentences in a wrong language to a low score.
Translating into the wrong language ('off-target translations') are a problem with LLM-based translation. I would always recommend running both a neural metric (like COMET) and a string-based metric (bleu or chrF) as well since the latter are more sensitive to off-target translations.
Question
My mt system
mt_1
translates some sentences producing bad translations that are same with the source text, but it has a higher score. My another translation systemmt_2
translates the same source text and only partially translates it, yet it has a lower score.Is such a comet score credible? Do I need to keep the comet score intact or do I need to modify it manually (such as change the mt_1 comet to 0)? Thanks for any helpful answers.
Here are two examples:
mt_1
gets comet 92.40,mt_2
gets comet 90.15.mt_1
gets comet 71.19,mt_2
gets comet 69.13.Then I tried to make the whole tgt be same with src and don't change ref on 1875 sents, that means a mt system don't translate any sentence, and gets a Avg comet 65.72. It seems too high.
Code
I used the comet model
wmt22-comet-da
, my srcipt is:Environment