geek-ai / Texygen

A text generation benchmarking platform
MIT License
863 stars 203 forks source link

get_bleu_fast and get_bleu producing different result #8

Closed AranKomat closed 6 years ago

AranKomat commented 6 years ago

https://github.com/geek-ai/Texygen/blob/08c67a1fc37d9b3ec923ac9e3b6daeabce79fa3f/utils/metrics/Bleu.py#L65

When you calculate BLEU, the size of the reference list should match exactly, since a reference list with greater number of sentences produces higher BLEU, which was mentioned in the paper that introduced BLEU score. The aforementioned line curtails the original reference list consisting of 10k sentences to 500 sentences, which results in lower BLEU score. The same thing can be said for self-BLEU. Did you calculate COCO (self)-BLEU with get_bleu_fast?

Yaoming95 commented 6 years ago

We use "get_bleu" for computing both BLEU score and S-BLEU score. "get_bleu_fast" is not used in the final results and paper.