Current BLEU implementation prefers empty hypothesis

ZJaume commented 4 years ago

Hi,

I think there is something wrong with the bleu implementation bestbleu.py because it prefers empty hypothesis. For example having the sentence:

 (" Gjør det fordi jeg sier det. ")

as source, and the reference:

 (" Do it because I say so. ")

and the nbest list as:

0 ||| (“Do it because I say so.”) ||| F0= -5.48577 ||| -0.457148
0 ||| (“Do it because I say it.”) ||| F0= -5.69615 ||| -0.474679
0 ||| (“Do it because I say it.") ||| F0= -6.20291 ||| -0.5639
0 ||| (“Do so because I say so.”) ||| F0= -7.10361 ||| -0.591967
0 ||| (“Do this because I say so.”) ||| F0= -7.25004 ||| -0.60417
0 ||| ("Do it because I say it.") ||| F0= -6.30103 ||| -0.630103
0 ||| (“Do it because I say so.” ||| F0= -7.08298 ||| -0.643907
0 |||  ||| F0= -1.93607 ||| -1.93607

it chooses the empty sentence always. Maybe my teacher model is not so good and it is producing a lot of empty hypothesis and after bestbleu filtering I'm getting a lot of empty lines.

I tried using sacrebleu instead and the problem seems to disappear.

Also some fixes of sacrebleu #9

ZJaume commented 4 years ago

BLEU of empty line is returning 1


>>> from bestbleu import compute_bleu
>>> compute_bleu("this is what you want", " ")
1.0
>>> import sacrebleu
>>> sacrebleu.corpus_bleu(['this is what you want'],[' ']).score
5.341087579952926
>>> sacrebleu.corpus_bleu(['this is what you want'],[['this is what you want']]).score
100.00000000000004
>>> compute_bleu("this is what you want", "this is what you want")
0.09092617426809149

snukky commented 4 years ago

I agree, compute_bleu should return 0.0 for an empty input.

The snippets have errors, they should be:

compute_bleu(["this is what you want"], " ")
compute_bleu(["this is what you want"], "this is what you want")

sacrebleu.sentence_bleu(['this is what you want'],[' ']).score
sacrebleu.sentence_bleu(['this is what you want'],['this is what you want']).score

snukky commented 4 years ago

Should be fixed with ca58f555d5a669b0100a9da9fab1703bc96e76fa

ZJaume commented 4 years ago

Seems fixed to me

browsermt / students

Current BLEU implementation prefers empty hypothesis #10