Use right range and threshold for showing "bad" words/sentences

browsermt / bergamot-translator

Cross platform C++ library focusing on optimized machine translation on the consumer-grade device.

http://browser.mt

Mozilla Public License 2.0

341 stars 38 forks source link

Use right range and threshold for showing "bad" words/sentences #370

Closed abhi-agg closed 2 years ago

abhi-agg commented 2 years ago

Higher QE scores means better quality. Changed the threshold to ~~-0.5~~ ln(0.5) => -0.6931 as per discussions in QE meetings.

@mfomicheva @abarbosa94 @felipesantosk Please let me know if any of the above is wrong 👍🏾

jelmervdl commented 2 years ago

If these are the correct thresholds we can also lose the "these thresholds are just examples" comments

abhi-agg commented 2 years ago

I just want to confirm with @mfomicheva @abarbosa94 @felipesantosk once more that the threshold of -0.5 is a good one as a starting point for all the language pairs irrespective of whether the quality scores are returned using translation models or supervised QE models under the hood.

I can remove the comment after their confirmation. Thanks for pointing out 👍🏾

mfomicheva commented 2 years ago

I just want to confirm with @mfomicheva @abarbosa94 @felipesantosk once more that the threshold of -0.5 is a good one as a starting point for all the language pairs irrespective of whether the quality scores are returned using translation models or supervised QE models under the hood.

I can remove the comment after their confirmation. Thanks for pointing out 👍🏾

I responded on slack

abhi-agg commented 2 years ago

Just documenting what @mfomicheva shared:

For the supervised models that were fitted on annotated data (En-Es, En-Cs and En-Et language pairs), you should use the threshold that corresponds to the log of 0.5, which is around -0.6931 (here log means ln)

For the unsupervised case where the returned value is just the average log-prob coming directly from the MT model, I think you should still start with the same threshold and experiment further with it

The range [-0.6931, 0] means better quality

I will modify PR to reflect these changes. Updating the description of the PR as well.